dvdblk / hack4good-oecd

OECD Policy Document Analysis pipeline with LLMs.
MIT License
2 stars 1 forks source link

Add multistep preprocessing pipeline #6

Open dvdblk opened 1 year ago

dvdblk commented 1 year ago

Try to implement a version of this: https://towardsdatascience.com/extracting-text-from-pdf-files-with-python-a-comprehensive-guide-9fc4003d517