Open Sohammhatre10 opened 1 month ago
assign me
Will be assigning you this one first as I need a sequence for tracking the progression
i had a doubt. for admissions in maharashtra, there is no web page for cutoffs to scrape from. CET CELL provide pdf documents for it. so for that maybe using NLP would be a better option i guess.
You'll have to use pytessaract or llm parsers for scrapping through the pdfs. @gaurav-rm11
@gaurav-rm11 any progress here?
ive used pdfplumber to extract data from the pdf. but i it works on downloaded pdf and generate a csv file. will that do? then ill raise a PR.
Sounds good to me just wanted that data from the huge round pdf to be on the database. Raise a PR after that I'll check for any issues and inform you about them. Thanks for the update tho!
@gaurav-rm11 you may use external web sources like Shiksha too for the same.
The csv files must have the columns
College
, Branch
, Quota
, Category
, Gender
, OpenRank
, CloseRank
Example for the IITmain.csv file
Indian Institute of Technology Bhubaneswar
, "Civil Engineering (4 Years, Bachelor of Technology)"
, AI
,OPEN
, Gender-Neutral
, 9106
, 14782
@gaurav-rm11 any updates?
Requirements are -