AmenRa / retriv

A Python Search Engine for Humans 🥸
MIT License
174 stars 20 forks source link

HybridRetriever raise KeyError: -1 if the len of doc less than 1_000 #29

Open tshu-w opened 10 months ago

tshu-w commented 10 months ago

The cutoff of msearch for HybridRetriever is hardcode to 1_000, which makes map_internal_ids_to_original_ids raise KeyError when doc len less than 1_000 https://github.com/AmenRa/retriv/blob/c9baa011e3071c2369f81f5b6f3a87f0d444072d/retriv/hybrid_retriever.py#L254-L255

Thus, map_internal_ids_to_original_ids should be:

def map_internal_ids_to_original_ids(self, doc_ids: Iterable) -> List[str]:
    return [self.id_mapping[doc_id] for doc_id in doc_ids if doc_id != -1]
AmenRa commented 10 months ago

Thanks for reporting the bug! I'll fix it soon.