Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
12.25k stars 916 forks source link

[BUG] - Error when Indexing GraphRAG: "Columns must be same length as key". #291

Open RealmX1 opened 1 week ago

RealmX1 commented 1 week ago

Description

image

Using docker-lite, only change after launch is adding open AI key at UI for both Resources-LLMs and Resources-Embeddings. (all else same as default)

File that I tried processing is the following: Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf

Logging on UI shows the following:

Indexing [1/1]: Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf
 => Converting Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf to text
 => Converted Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf to text
 => [Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf] Processed 124 chunks
 => Finished indexing Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf
[GraphRAG] Creating index... This can take a long time.
Logging enabled at /app/ktem_app_data/user_data/files/graphrag/211803cc-fa98-4401-a11c-81ba194627bc/output/20240915-040148/reports/indexing-engine.log

πŸš€ create_base_text_units

                                  id  ... n_tokens

0   3dda6fe0eb5249ed346539845c0dc19b  ...     1200

1   c81169e1f59d4dd5d7297b52e03a9000  ...      135

2   691b6410fb3cff6a4d2f4ba9b83310ca  ...     1150

3   731546fd82749e507bef710fc0b87d01  ...       50

4   b3da00796eb6a3b6982a798f62d4c2bf  ...     1199

..                               ...  ...      ...

92  67a0e1638e015699d255c594d91293bf  ...      879

93  ccb017fe481366e47509f3b9e8daf298  ...     1200

94  b3642f7c7d6f3399039982991c9616d4  ...      193

95  ce5b79bd812e12a6b7838976844f7f36  ...     1015

96  724fa5469511c6b27e1e60211971343c  ...     1003

[97 rows x 5 columns]

πŸš€ create_base_extracted_entities

                                        entity_graph

0  <graphml xmlns="http://graphml.graphdrawing.or...

πŸš€ create_summarized_entities

                                        entity_graph

0  <graphml xmlns="http://graphml.graphdrawing.or...

❌ create_base_entity_graph

None

β ‹ GraphRAG Indexer 

β”œβ”€β”€ Loading Input (text) - 62 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00

β”œβ”€β”€ create_base_text_units

β”œβ”€β”€ create_base_extracted_entities

β”œβ”€β”€ create_summarized_entities

└── create_base_entity_graph❌ Errors occurred during the pipeline run, see logs for more details.

The corresponding runtime log mentioned in the UI log is attached here: indexing-engine.log

Reproduction steps

Using docker-lite, only change after launch is adding open AI key at UI for both Resources-LLMs and Resources-Embeddings. (all else same as default)

Screenshots

??? can't paste screenshot here???

Logs

Indexing [1/1]: Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf
 => Converting Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf to text
 => Converted Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf to text
 => [Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf] Processed 124 chunks
 => Finished indexing Buehler_2024_Mach._Learn.__Sci._Technol._10.1088_2632-2153_ad7228.pdf
[GraphRAG] Creating index... This can take a long time.
Logging enabled at /app/ktem_app_data/user_data/files/graphrag/211803cc-fa98-4401-a11c-81ba194627bc/output/20240915-040148/reports/indexing-engine.log

πŸš€ create_base_text_units

                                  id  ... n_tokens

0   3dda6fe0eb5249ed346539845c0dc19b  ...     1200

1   c81169e1f59d4dd5d7297b52e03a9000  ...      135

2   691b6410fb3cff6a4d2f4ba9b83310ca  ...     1150

3   731546fd82749e507bef710fc0b87d01  ...       50

4   b3da00796eb6a3b6982a798f62d4c2bf  ...     1199

..                               ...  ...      ...

92  67a0e1638e015699d255c594d91293bf  ...      879

93  ccb017fe481366e47509f3b9e8daf298  ...     1200

94  b3642f7c7d6f3399039982991c9616d4  ...      193

95  ce5b79bd812e12a6b7838976844f7f36  ...     1015

96  724fa5469511c6b27e1e60211971343c  ...     1003

[97 rows x 5 columns]

πŸš€ create_base_extracted_entities

                                        entity_graph

0  <graphml xmlns="http://graphml.graphdrawing.or...

πŸš€ create_summarized_entities

                                        entity_graph

0  <graphml xmlns="http://graphml.graphdrawing.or...

❌ create_base_entity_graph

None

β ‹ GraphRAG Indexer 

β”œβ”€β”€ Loading Input (text) - 62 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00

β”œβ”€β”€ create_base_text_units

β”œβ”€β”€ create_base_extracted_entities

β”œβ”€β”€ create_summarized_entities

└── create_base_entity_graph❌ Errors occurred during the pipeline run, see logs for more details.

Browsers

Chrome

OS

Windows

Additional information

No response

leonlu-rialto commented 1 week ago

does anyone know where to change the graphrag settings like in the original repo's setting.yaml file?