TheTechTiger / MHTCET-cutoff-pdf-to-excel

A Python project that converts the cut-off PDFs from the MHT-CET website into Excel sheets for improved readability.
GNU General Public License v3.0
0 stars 0 forks source link

The cutoff 2024 pdf also contains stage 2 #1

Open Yash1231232 opened 3 weeks ago

Yash1231232 commented 3 weeks ago

The data in stage 2 in being skipped but i want both stage one and stage 2 data in my excel sheet Can you tell what to modify to get the desired output

TheTechTiger commented 3 weeks ago

I tried fixing it, but there are no definite identifiers for Stage 2 cut-offs. The backend of the program works by converting the PDF into plain text. For details like college name/code and branch name/code, there's a clear identifier—such as the college name and code lines beginning with the format 'XXXXX - ' (where X is any digit). The same applies for branch, seat level, and category cutoffs.

However, when logging the plain text version of the page containing a Stage 2 cut-off, there is no reliable or consistent way to identify the cut-off.

That said, you're welcome to attempt fixing this and contribute to the project. If not, the best approach to ensure accuracy in the Excel sheet is to manually go through the text files and page numbers in the 'skipped' folder, and then add the entries to the Excel sheet after running the data migrator.