Sohammhatre10 / Cadmi_AI

Predicts apt information for a student who needs references for possible admissions in undergraduate universitites.
1 stars 11 forks source link

Write a script for scraping Mumbai University cutoffs #1

Open Sohammhatre10 opened 1 month ago

Sohammhatre10 commented 1 month ago

Requirements are -

  1. A selenium bot for scraping Mumbai University's 2024 cutoff data and automation for logging
  2. Ease of use and clean code, add a doc string for everything
gaurav-rm11 commented 1 month ago

assign me

Sohammhatre10 commented 1 month ago

Will be assigning you this one first as I need a sequence for tracking the progression

gaurav-rm11 commented 1 month ago

i had a doubt. for admissions in maharashtra, there is no web page for cutoffs to scrape from. CET CELL provide pdf documents for it. so for that maybe using NLP would be a better option i guess.

Sohammhatre10 commented 1 month ago

You'll have to use pytessaract or llm parsers for scrapping through the pdfs. @gaurav-rm11

Sohammhatre10 commented 1 month ago

@gaurav-rm11 any progress here?

gaurav-rm11 commented 1 month ago

ive used pdfplumber to extract data from the pdf. but i it works on downloaded pdf and generate a csv file. will that do? then ill raise a PR.

Sohammhatre10 commented 1 month ago

Sounds good to me just wanted that data from the huge round pdf to be on the database. Raise a PR after that I'll check for any issues and inform you about them. Thanks for the update tho!

Sohammhatre10 commented 4 weeks ago

@gaurav-rm11 you may use external web sources like Shiksha too for the same. The csv files must have the columns College, Branch, Quota, Category, Gender, OpenRank, CloseRank

Example for the IITmain.csv file Indian Institute of Technology Bhubaneswar, "Civil Engineering (4 Years, Bachelor of Technology)", AI ,OPEN, Gender-Neutral, 9106, 14782

Sohammhatre10 commented 3 weeks ago

@gaurav-rm11 any updates?