Closed mowliv closed 1 year ago
I'm not sure exactly, guessing it has to do with the needed libraries for parsing the pdfs. Not sure why it would happen after install though.
If you're curious, the library that this code uses to get the text from PDFs is this project: https://github.com/Unstructured-IO/unstructured. You can also try substituting it for this one if you're not getting the results you're looking for: https://pypi.org/project/PyPDF2/
Okay I simplified things and removed the dependency on unstructured. It should now work without the extra downloads (though it requires you to run pip install -r requirements.txt again). It now also allows you to ask follow up questions :)
I gave it a PDF file and it failed as shown below. I was a little concerned to see it downloading packages. I don't see any reference to NLTK. Please comment on that and also the error I got. Thanks.