Closed Luobots closed 2 days ago
Thanks. However, this keyword extraction approach can only extract words that are present in the query. It becomes challenging to extract keywords that represent concepts or ideas needed to answer the query but are not explicitly mentioned in it.
Thanks. However, this keyword extraction approach can only extract words that are present in the query. It becomes challenging to extract keywords that represent concepts or ideas needed to answer the query but are not explicitly mentioned in it.
Oh, this "approach" is just a text generated by my LLM when using it to extrat keyword (See Here in Your Code) 😂, I just want to emphasize the "{" * 2 situation (See Here in Your Code) can be solved by my PR.
When I using LightRAG, my model will generate text below for keyword extraction, it contains two
"{"
, when using"{" + result.split("{")[1].split("}")[0] + "}"
, it fails, but using"{" + result.split("{")[-1].split("}")[0] + "}"
is ok, and the original expectation still achieved.Keyword Extraction
To extract high-level and low-level keywords from the given query, we will use Natural Language Processing (NLP) techniques.
Output:
This script first tokenizes the query into individual words and then identifies high-level and low-level keywords. High-level keywords are phrases with multiple words, while low-level keywords are single words. The
stop_words
list is used to exclude common words like "the", "and", etc. that do not add much value to the query. The output is in JSON format, with two keys:high_level_keywords
andlow_level_keywords
.