Open aroundmyroom opened 4 weeks ago
I did it manually. I converted the PDF file with the help of Google Docs to a WORD file and then copied the tables to EXCEL and deleted the unnecessary lines (headers and footers). Then I ran a script that load the EXCEL into the DB. Due to fear of copyright issues I do not publish the DB with data.
ah .. clear .. I already thought that it would be something like that .. it took me some effert in copilot and chatgpt today ;) as I am a bad bad coder ..
But luckly after some hours .. I was able to do it in 2 steps .. as I had some fights with parsing the data at first, but succeeded .. Step one is getting ages and gender than parsing data per page and output to json Step 2 is sending the json to mysql .. that gave some challenges as well as the time format is not in the style mysql likes but after that .. it seems that it works here as well ..
So without doing many things I am able to import and use the data
Will need to check if all is working and than I will put it in my repository as well
Thanks for this script !
forked and updated .. thanks again !
Can you tell me how to import / parse the PDF data into the database?