SapienzaNLP / relik

Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)
316 stars 17 forks source link

On windows still get on relik import, OverflowError: Python int too large to convert to C long (see fix noted) #15

Closed stevereiner closed 1 month ago

stevereiner commented 1 month ago

On ubuntu, relik 1.0.7 did prevent install with python 3.12, relik 1.0.7 worked with python 3.10, was having trouble switching to python 3.11 so didn't verify With python 3.12, 3.11, 3.10 still get this windows with 1.0.7 relik File "C:\newdev2\relik\relik-text.py", line 1, in import relik File "C:\Users\sreiner\AppData\Roaming\Python\Python310\site-packages\relik__init.py", line 1, in from relik.inference.annotator import Relik File "C:\Users\sreiner\AppData\Roaming\Python\Python310\site-packages\relik\inference\annotator.py", line 16, in from relik.inference.data.objects import ( File "C:\Users\sreiner\AppData\Roaming\Python\Python310\site-packages\relik\inference\data\objects.py", line 8, in from relik.retriever.indexers.document import Document File "C:\Users\sreiner\AppData\Roaming\Python\Python310\site-packages\relik\retriever\init.py", line 1, in from relik.retriever.pytorch_modules.model import GoldenRetriever File "C:\Users\sreiner\AppData\Roaming\Python\Python310\site-packages\relik\retriever\pytorch_modules\init__.py", line 5, in from relik.retriever.indexers.document import Document File "C:\Users\sreiner\AppData\Roaming\Python\Python310\site-packages\relik\retriever\indexers\document.py", line 11, in csv.field_size_limit(sys.maxsize) OverflowError: Python int too large to convert to C long

stevereiner commented 1 month ago

On linux ubuntu 22.04, relik 1.0.7 did also work with python 3.11 in addition to python 3.10 with virtualenv doing the switch to both.

stevereiner commented 1 month ago

\relik\retriever\indexers\document.py", line 11, in csv.field_size_limit(sys.maxsize) change to csv.field_size_limit(min(sys.maxsize, 2147483646)) fixes Windows problem

See https://stackoverflow.com/questions/54514998/python-3-7-64-bit-on-windows-7-64-bit-csv-field-larger-than-field-limit-13

stevereiner commented 1 month ago

Also with a clean new python 3.12.6 install on linux ubuntu 22.04 with just pip install relik==1.0.6 in virtualenv The import relik worked fine (don't get the errors I had in issue #14 ) So I don't think the change in 1.0.7 to prevent install with python 3.12 is needed.

Riccorl commented 1 month ago

Fixed by #16, closing.

harrisonfloam commented 4 weeks ago

Hate to say it, but I don't think this issue was solved. I'm still encountering the same OverflowError on import in a clean conda environment using Python 3.10.x. Let me know if it makes sense for me to open a new issue.

Error Traceback:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[3], line 1
----> 1 import relik
File c:\Users\XXXX\AppData\Local\miniconda3\envs\relik\lib\site-packages\relik\__init__.py:1
----> 1 from relik.inference.annotator import Relik
File c:\Users\XXXX\AppData\Local\miniconda3\envs\relik\lib\site-packages\relik\inference\data\objects.py:8
----> 8 from relik.retriever.indexers.document import Document
File c:\Users\XXXX\AppData\Local\miniconda3\envs\relik\lib\site-packages\relik\retriever\indexers\document.py:11
---> 11 csv.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long

Error reproduced with: Python: 3.10.12 - 3.10.14 Relik: 1.0.7 Environment:

stevereiner commented 4 weeks ago
  1. fix was in my first pull request after Relik 1.0.7 to fix overflow error on windows that was pulled in the main source but not in a released version (also would allow python 3.12 again)
  2. I have 2nd pull request that fixes building from source on Windows, adds note to README.md about performance slower vs Linux (could be in LlamaIndex Extractor or Relik) and need to have a main function. Also put in to be 1.0.8 when next released. This second pull request hasn't been pulled.