circlemind-ai / fast-graphrag

RAG that intelligently adapts to your use case, data, and queries
MIT License
2.12k stars 88 forks source link

indexing issue #4

Closed manishiitg closed 3 weeks ago

manishiitg commented 3 weeks ago

got this error when trying to grag.insert(f.read())

file_path ./SantaClaraData/other_city_files/Santa_Clara_University_Five_Year_Master_Plan/Santa_Clara_University_Five_Year_Master_Plan_Santa_Clara_University_Development_Plan.txt
Error during insertion: Python int too large to convert to C long
Traceback (most recent call last):
  File "/Users/mipl/ramen/backend/fast-graph-rag/main.py", line 42, in <module>
    grag.insert(f.read())
  File "/Users/mipl/ramen/backend/venv/lib/python3.12/site-packages/fast_graphrag/_graphrag.py", line 53, in insert
    return get_event_loop().run_until_complete(self.async_insert(content, metadata))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/mipl/ramen/backend/venv/lib/python3.12/site-packages/fast_graphrag/_graphrag.py", line 99, in async_insert
    raise e
  File "/Users/mipl/ramen/backend/venv/lib/python3.12/site-packages/fast_graphrag/_graphrag.py", line 80, in async_insert
    new_chunks_per_data = await self.state_manager.filter_new_chunks(chunks_per_data=chunked_documents)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mipl/ramen/backend/venv/lib/python3.12/site-packages/fast_graphrag/_services/_state_manager.py", line 58, in filter_new_chunks
    new_chunks_mask = await self.chunk_storage.mask_new(keys=[c.id for c in flattened_chunks])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mipl/ramen/backend/venv/lib/python3.12/site-packages/fast_graphrag/_storage/_ikv_pickle.py", line 66, in mask_new
    self._np_keys = np.fromiter(
                    ^^^^^^^^^^^^
OverflowError: Python int too large to convert to C long
liukidar commented 3 weeks ago

Hello, I think there was a mixing of 32 and 64 bit integers for the indices. Hopefully now it's fixed. Can you check out the fix-chunk-id branch and let me know if it fixes the problem. Otherwise, do you mind sharing the code causing the issue? You can also reach out to me on Discord if you do not want to share the data publicly :)

liukidar commented 3 weeks ago

Hello, I think there was a mixing of 32 and 64 bit integers for the indices. Hopefully now it's fixed. Can you check out the fix-chunk-id branch and let me know if it fixes the problem. Otherwise, do you mind sharing the code causing the issue? You can also reach out to me on Discord if you do not want to share the data publicly :)

I have also merged this into main.