Sohammhatre10 / Cadmi_AI

Predicts apt information for a student who needs references for possible admissions in undergraduate universitites.
1 stars 8 forks source link

Write a script for scraping Mumbai University cutoffs #1

Open Sohammhatre10 opened 2 weeks ago

Sohammhatre10 commented 2 weeks ago

Requirements are -

  1. A selenium bot for scraping Mumbai University's 2024 cutoff data and automation for logging
  2. Ease of use and clean code, add a doc string for everything
gaurav-rm11 commented 2 weeks ago

assign me

Sohammhatre10 commented 2 weeks ago

Will be assigning you this one first as I need a sequence for tracking the progression

gaurav-rm11 commented 1 week ago

i had a doubt. for admissions in maharashtra, there is no web page for cutoffs to scrape from. CET CELL provide pdf documents for it. so for that maybe using NLP would be a better option i guess.

Sohammhatre10 commented 1 week ago

You'll have to use pytessaract or llm parsers for scrapping through the pdfs. @gaurav-rm11

Sohammhatre10 commented 6 days ago

@gaurav-rm11 any progress here?

gaurav-rm11 commented 6 days ago

ive used pdfplumber to extract data from the pdf. but i it works on downloaded pdf and generate a csv file. will that do? then ill raise a PR.

Sohammhatre10 commented 6 days ago

Sounds good to me just wanted that data from the huge round pdf to be on the database. Raise a PR after that I'll check for any issues and inform you about them. Thanks for the update tho!

Sohammhatre10 commented 2 days ago

@gaurav-rm11 you may use external web sources like Shiksha too for the same. The csv files must have the columns College, Branch, Quota, Category, Gender, OpenRank, CloseRank

Example for the IITmain.csv file Indian Institute of Technology Bhubaneswar, "Civil Engineering (4 Years, Bachelor of Technology)", AI ,OPEN, Gender-Neutral, 9106, 14782