UNICT-DMI / UNICT-Elezioni

All elections results of University of Catania 🎓
https://unict-dmi.github.io/UNICT-Elezioni
GNU Affero General Public License v3.0
12 stars 4 forks source link

semi-automatic parser #140

Open Gigi-G opened 1 year ago

Gigi-G commented 1 year ago

JSON Parser

For now, we can use the following steps to generate the JSON files:

  1. Use https://croppdf.com/ to remove all unnecessary white spaces from the PDF document.

  2. Utilize https://products.aspose.app/pdf/table-extraction to create an Excel file directly from the PDF. This is because converting it directly to CSV may result in a poor-quality output. Creating an XLS file first and then converting it will yield a better result.

  3. Review and edit the document to eliminate unnecessary white spaces or inconsistencies.

The goal is to create an automatic pipeline that performs these steps.

Helias commented 1 year ago

can be closed?