-
- [ ] [SWE-bench](https://www.swebench.com/lite.html)
# SWE-bench Lite
## A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers
Carlos E. Jimenez, John Yang, Jiayi Ge…
-
`FileNotFoundError: [Errno 2] No such file or directory: '\\kilt\\kilt\\configs\\retriever\\default_bm25.json'`
In the retrieval configs there is no default file for bm25. Can you guys add that? Th…
-
### Bug Description
In BM25 retriever the corpus is built from nodes using the default argument to get_content() as shown below.
`self._corpus = [self._tokenizer(node.get_content()) for node in …
-
如题所述:qwen-agent在RAG的博客(https://qwenlm.github.io/zh/blog/qwen-agent-2405)中提及的分块阅读的暴力检索方案,如下:
![image](https://github.com/user-attachments/assets/9da39f39-7a22-4fab-8a66-7196f0071374)
为什么已经用了LLM去评估相关性…
-
Found thus only cause I was curious how fastembed calculates token ids for bm25/bm42, so I took a deep dive:
## Problem
The [function compute_token_id](https://github.com/qdrant/fastembed/blob/0…
-
I suggest to add a function to bind BM25 score *(which is based on a probabilistic term weighting model)*. It is useful in some cases as it gives control over:
- Term frequency saturation
- Docume…
-
I am trying to use SDM and BM25 with CEDR, by following the docs, but I think I'm missing something.
```
SDM = pt.rewrite.SDM()
BM25 = pt.BatchRetrieve(indexref, controls={"wmodel" : "BM25"}, ver…
-
Issue is to track efforts towards implementing/Improving RAG implementation.
- [x] Implement Naive RAG
- [x] Implement ability to ingest/import a mediawiki DB and associated needs for making it ef…
-
Not sure what happened but saw this in the logs:
```
se.py", line 458, in result
2024-08-05 21:48:44 | ERROR | stderr | return self.__get_result()
2024-08-05 21:48:44 | ERROR | stderr | Fi…
-
### What feature are you requesting?
I would love to see Polish language stemming.
Currently, the dependency chain disallows it due to abandoned libraries, but an alternative library could allow…