A TypeError was occurring in the hybrid query mode when trying to access content from text units that contained None values. The error specifically occurred in the _find_most_related_text_unit_from_entities function when trying to process text units for token size truncation.
Root Cause
The issue stemmed from insufficient null checks when processing text units in the knowledge graph. Specifically:
Text unit data could be None when retrieved from text_chunks_db
The data dictionary could be missing the 'content' field
No proper filtering of invalid entries before token size truncation
Key problematic area was in:
591:597:LightRAG/lightrag/operate.py
if any([v is None for v in all_text_units_lookup.values()]):
logger.warning("Text chunks are missing, maybe the storage is damaged")
all_text_units = [
{"id": k, **v} for k, v in all_text_units_lookup.items() if v is not None
]
all_text_units = sorted(
all_text_units, key=lambda x: (x["order"], -x["relation_counts"])
Solution
Added comprehensive null checks and data validation throughout the text unit processing pipeline:
Added null check for node data and source_id field:
571:575:LightRAG/lightrag/operate.py
for k, v in zip(all_one_hop_nodes, all_one_hop_nodes_data)
if v is not None
}
all_text_units_lookup = {}
for index, (this_text_units, this_edges) in enumerate(zip(text_units, edges)):
Added content validation when getting chunk data:
591:597:LightRAG/lightrag/operate.py
if any([v is None for v in all_text_units_lookup.values()]):
logger.warning("Text chunks are missing, maybe the storage is damaged")
all_text_units = [
{"id": k, **v} for k, v in all_text_units_lookup.items() if v is not None
]
all_text_units = sorted(
all_text_units, key=lambda x: (x["order"], -x["relation_counts"])
Added comprehensive filtering for None values:
599:604:LightRAG/lightrag/operate.py
all_text_units = truncate_list_by_token_size(
all_text_units,
key=lambda x: x["data"]["content"],
max_token_size=query_param.max_token_for_text_unit,
)
all_text_units: list[TextChunkSchema] = [t["data"] for t in all_text_units]
The changes are backward compatible and require no modifications to the existing API or data structures.
LightRAG Bug Fix Report
Issue
A TypeError was occurring in the hybrid query mode when trying to access content from text units that contained None values. The error specifically occurred in the
_find_most_related_text_unit_from_entities
function when trying to process text units for token size truncation.Root Cause
The issue stemmed from insufficient null checks when processing text units in the knowledge graph. Specifically:
Key problematic area was in:
591:597:LightRAG/lightrag/operate.py
Solution
Added comprehensive null checks and data validation throughout the text unit processing pipeline:
571:575:LightRAG/lightrag/operate.py
591:597:LightRAG/lightrag/operate.py
599:604:LightRAG/lightrag/operate.py
The changes are backward compatible and require no modifications to the existing API or data structures.