Closed svjack closed 3 years ago
This is the same suggestion to EasyNMT, you give the user many choices, you should also give some advice or “the best”.
just like (pmlb) https://github.com/EpistasisLab/pmlb/blob/master/examples/fetch_nearest_datasets.ipynb doing, it provide a dataset suggestion on dataset, but i seek a same function in nlp task ( in dataset compare or model compare ), in this project.
I think https://github.com/facebookresearch/anli https://arxiv.org/pdf/1910.14599.pdf is the framework support this function in Natural Language Understanding which “ combine SNLI+MNLI+FEVER-NLI and up-sample different rounds of ANLI to train the models. ” And i think you can try this in your IR domain, to have this similar function.
I think https://github.com/facebookresearch/anli https://arxiv.org/pdf/1910.14599.pdf is the framework support this function in Natural Language Understanding which “ combine SNLI+MNLI+FEVER-NLI and up-sample different rounds of ANLI to train the models. ” And i think you can try this in your IR domain, to have this similar function. Use the idea in https://github.com/facebookresearch/ParlAI/tree/mastering_the_dungeon/projects/mastering_the_dungeon https://arxiv.org/pdf/1711.07950.pdf
I think https://github.com/facebookresearch/anli https://arxiv.org/pdf/1910.14599.pdf is the framework support this function in Natural Language Understanding which “ combine SNLI+MNLI+FEVER-NLI and up-sample different rounds of ANLI to train the models. ” And i think you can try this in your IR domain, to have this similar function.
All about dynamic benchmarks
I think this benchmark may have the function to support choose the best model from a model list, by compare the performance measurements on one dataset among them. This require the dataset have same interface.
And support a model combination choose support to switch the use model by different semantic feature (sometime use “bm25”, sometime use “sbert” , switch by feature character), to make the final conclusion more consistently.
This will make this benchmark not only a benchmark, but a meta ensemble model framework to combine and improve the final performance on single dataset wth different features.
Yes, our experiments with all the datasets are coming out soon in our paper, and I will add the performance scores once the paper pre-print is out. Yes, all the datasets have the same interface and can be downloaded from here - https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/
.
This is the same suggestion to EasyNMT, you give the user many choices, you should also give some advice or “the best”.
Yes, I also plan to add suggestions, on which model performs the best on a task and more details. For now, I would suggest BM25 (Lexical) and the distilroberta-base-msmarco-v2
SBERT (Dense) model are strong models, and you could use them.
This is the same suggestion to EasyNMT, you give the user many choices, you should also give some advice or “the best”.
Yes, I also plan to add suggestions, on which model performs the best on a task and more details. For now, I would suggest BM25 (Lexical) and the
distilroberta-base-msmarco-v2
SBERT (Dense) model are strong models, and you could use them.
I always review some code in your example dir. I find that they also provide some "train" examples such as use sbert cross-encoder to filter and save data first and train on these filtered data. Because sbert also a project in UKP lab I want to know your future support about "train" with different models such as Google universal encoder and so on (i know Google not release the training code) So do you have these plan to support more "train" with user defined different models (with same interface) ?
This is the same suggestion to EasyNMT, you give the user many choices, you should also give some advice or “the best”.
Yes, I also plan to add suggestions, on which model performs the best on a task and more details. For now, I would suggest BM25 (Lexical) and the
distilroberta-base-msmarco-v2
SBERT (Dense) model are strong models, and you could use them.I always review some code in your example dir. I find that they also provide some "train" examples such as use sbert cross-encoder to filter and save data first and train on these filtered data. Because sbert also a project in UKP lab I want to know your future support about "train" with different models such as Google universal encoder and so on (i know Google not release the training code) So do you have these plan to support more "train" with user defined different models (with same interface) ?
This will make the project not only a benchmarks but a toolkit to improve.
This is the same suggestion to EasyNMT, you give the user many choices, you should also give some advice or “the best”.
Yes, I also plan to add suggestions, on which model performs the best on a task and more details. For now, I would suggest BM25 (Lexical) and the
distilroberta-base-msmarco-v2
SBERT (Dense) model are strong models, and you could use them.I always review some code in your example dir. I find that they also provide some "train" examples such as use sbert cross-encoder to filter and save data first and train on these filtered data. Because sbert also a project in UKP lab I want to know your future support about "train" with different models such as Google universal encoder and so on (i know Google not release the training code) So do you have these plan to support more "train" with user defined different models (with same interface) ?
Yes, we provide training code and examples for SBERT bi-encoder for retriever training and in the future will wish to add training code for SBERT cross-encoder for query generation and filtration as well. For our experiments, we find SBERT models outperforming DPR and USE-QA plus convenient since SBERT is documented well. I won't be able to add training methods for models apart from SBERT for now.
Closing the Issue due to no recent activity!
I think this benchmark may have the function to support choose the best model from a model list, by compare the performance measurements on one dataset among them. This require the dataset have same interface.
And support a model combination choose support to switch the use model by different semantic feature (sometime use “bm25”, sometime use “sbert” , switch by feature character), to make the final conclusion more consistently.
This will make this benchmark not only a benchmark, but a meta ensemble model framework to combine and improve the final performance on single dataset wth different features.