-
Use a BEIR dataset (smallest ones) to set up an end to end evaluation.
Tools
- primeqa
Experiments
- sparse-bm25 retrieval (https://github.com/primeqa/primeqa/tree/main/notebooks/ir/sparse)
…
-
I'm trying to run beir/examples/retrieval/evaluation/dense/evaluate_sbert_multi_gpu.py. Doing do I end up with the below error.
Traceback (most recent call last):
File "evaluate_sbert_multi_gpu.…
-
Hi,
For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.
But I don't find any code introducing bm25 index and bm25 sampling.
I gue…
-
Hi, I run the following commond to build the database for small file (all_full.tsv) about 3.17 MB :
```
SEQDB_PATH=/home/fastMSA/all_full.tsv
MODEL_PATH=/home/Dense-Homolog-Retrieval/Dense-Homolog-…
-
Build a model for ranking clarifying questions given an instruction.
See [What to ask](https://www.aicrowd.com/challenges/neurips-2022-iglu-challenge/problems/neurips-2022-iglu-challenge-nlp-task#eva…
-
# Environments
Python: 3.9
OS: Ubuntu 20.04
FlagEmbedding 1.2.5
transformers 4.33.1
# Details
my test python file `bge-test.py`:
```
from FlagEmbedding import BGEM3F…
-
Pre-training:
1. Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering(ACL2022)
2. RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Langua…
-
### 结论
选型建议:
1. 大部分模型的序列长度是 512 tokens。 8192 可尝试 tao-8k,1024 可尝试 stella。
2. 在专业数据领域上,嵌入模型的表现不如 BM25,但是微调可以大大提升效果。
3. 有微调需求且对模型训练了解较少的,建议选择 bge 系列(完善的训练脚本、负例挖掘等)。但多数模型都基于BERT,训练脚本也通用,其他模型也可以参考。…
-
following the introduction and requirements, Fine-tuning a retrieval mode based on "Luyu/co-condenser-marco", but loading the training data error
`
python -m dense.driver.train --output_dir ./ret…
-
在模型调用的时候,会出现没有encode方法的报错。使用的是QWen0.5B模型:
`Error while evaluating CmedqaRetrieval: 'Qwen2Model' object has no attribute 'encode'
Traceback (most recent call last):
File "/data2/mteb/zhangchi/mteb…