acmeyer / GPTSummarization

Get a summary of a document or url using GPT.
5 stars 3 forks source link

TypeError: 'type' object is not subscriptable #2

Closed mowliv closed 1 year ago

mowliv commented 1 year ago

I gave it a PDF file and it failed as shown below. I was a little concerned to see it downloading packages. I don't see any reference to NLTK. Please comment on that and also the error I got. Thanks.

$ python main.py  -f ~/Downloads/manuscript.pdf
[nltk_data] Downloading package punkt to /Users/Michael/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/Michael/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
Traceback (most recent call last):
  File "main.py", line 45, in <module>
    def split_text(text: str) -> list[str]:
TypeError: 'type' object is not subscriptable
acmeyer commented 1 year ago

I'm not sure exactly, guessing it has to do with the needed libraries for parsing the pdfs. Not sure why it would happen after install though.

If you're curious, the library that this code uses to get the text from PDFs is this project: https://github.com/Unstructured-IO/unstructured. You can also try substituting it for this one if you're not getting the results you're looking for: https://pypi.org/project/PyPDF2/

acmeyer commented 1 year ago

Okay I simplified things and removed the dependency on unstructured. It should now work without the extra downloads (though it requires you to run pip install -r requirements.txt again). It now also allows you to ask follow up questions :)