SamEdwardes / spacypdfreader

Easy PDF to text to spaCy text extraction in Python.
https://samedwardes.github.io/spacypdfreader/
MIT License
33 stars 1 forks source link

Can't install in colab enviroment #1

Closed victorescosta closed 2 years ago

victorescosta commented 2 years ago

Can anyone help me with this? I'm trying to install spacypdfreader using google colab, and it returns the following error message: Error message

I used this last week and it was working, now i don't know how to proceed. ps: I already installed spacy package

SamEdwardes commented 2 years ago

It is strange that it worked last week but not now.

What version of python are you using in Google Colab? Is the notebook public, can you share a link?

SamEdwardes commented 2 years ago

I think I see the issue:

https://github.com/SamEdwardes/spaCyPDFreader/blob/ce083bc5b61b06084c818b7d243f3c9210274442/pyproject.toml#L11-L15

The requirements of spacypdfreader are python = "^3.9". Google Colab is on python 3.7:

image

I think spacypdfreader should be able to work on python 3.7. I will update the requirements for ^3.7 and check if it works.

victorescosta commented 2 years ago

I think I see the issue:

https://github.com/SamEdwardes/spaCyPDFreader/blob/ce083bc5b61b06084c818b7d243f3c9210274442/pyproject.toml#L11-L15

The requirements of spacypdfreader are python = "^3.9". Google Colab is on python 3.7:

image

I think spacypdfreader should be able to work on python 3.7. I will update the requirements for ^3.7 and check if it works.

Maybe I was mistaken about running at google colab, and I just runned at my laptop. Probably you're right. And it would be great if this problem can be fixed using python 3.7. I will follow possible updates, thanks for replying me.

SamEdwardes commented 2 years ago

I just closed a PR (#2) that should fix the issue. It now works for me on Google colab. You can try this:

!python --version
!pip install spacypdfreader
!python -m spacy download "en_core_web_sm"

import requests

import spacy
from spacypdfreader import pdf_reader

# Download a PDF.
url = "https://github.com/SamEdwardes/spaCyPDFreader/raw/main/tests/data/test_pdf_01.pdf"
response = requests.get(url)
with open('test.pdf', 'wb') as f:
    f.write(response.content)

nlp = spacy.load("en_core_web_sm")
doc = pdf_reader("test.pdf", nlp)
print(doc)
victorescosta commented 2 years ago

I just closed a PR (#2) that should fix the issue. It now works for me on Google colab. You can try this:

!python --version
!pip install spacypdfreader
!python -m spacy download "en_core_web_sm"

import requests

import spacy
from spacypdfreader import pdf_reader

# Download a PDF.
url = "https://github.com/SamEdwardes/spaCyPDFreader/raw/main/tests/data/test_pdf_01.pdf"
response = requests.get(url)
with open('test.pdf', 'wb') as f:
    f.write(response.content)

nlp = spacy.load("en_core_web_sm")
doc = pdf_reader("test.pdf", nlp)
print(doc)

Thank you! Now it's working fine.