dorianbrown / rank_bm25

A Collection of BM25 Algorithms in Python
Apache License 2.0
1.02k stars 86 forks source link

bm25 search problem #42

Closed Lapitel closed 4 months ago

Lapitel commented 4 months ago

I'm using BM25Okapi for keyword search, but the results aren't what I expected. Why is this happening?

from rank_bm25 import BM25Okapi

docs = ["corperation report", "process manual"]
question = "corperation report"

def tokenize(string):
    return string.strip().split()

# Tokenize documents and query
tokenized_docs = [tokenize(doc) for doc in docs]
tokenized_query = tokenize(question)

# Initialize BM25 with tokenized documents
bm25 = BM25Okapi(corpus=tokenized_docs)

# Get top n documents for the tokenized query
result = bm25.get_top_n(query=tokenized_query, documents=docs, n=len(docs))
print("Search results:", result)