beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

Is it reasonable to add other models to construct QFilter except CrossEncoder And can QGenModel will help to improve with real query ? #2

Closed svjack closed 3 years ago

svjack commented 3 years ago

Such as BM25Search or others, I see in load_train function of TrainRetriever use score to filter sample, I think some experiments conclusions about use different QFilter to train retriever should be take in consideration.

svjack commented 3 years ago

And i find a method to normalize the bm25 score to make it like CrossEncoder score in : https://stats.stackexchange.com/questions/171589/normalised-score-for-bm25 it says:

Normalize each score, by dividing it with the sum of all scores (of say the top 10 results). Looking at score for the first hit, it now means: "Are there lots of other hits that also match this query?". If there are, the number will be low, else it will be high.
svjack commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

svjack commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

I think this will require the QGenModel be more controllable, Such as below paper use Controlled Adversarial Text Generation to improve the model robustness. https://www.aclweb.org/anthology/2020.emnlp-main.417.pdf i think this may also works in IR domain

svjack commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

I think this will require the QGenModel be more controllable, Such as below paper use Controlled Adversarial Text Generation to improve the model robustness. https://www.aclweb.org/anthology/2020.emnlp-main.417.pdf i think this may also works in IR domain

And the attribute will be the main entity that you can extract from document

svjack commented 3 years ago

And i think you should also release the training code of your t5-small, similar with https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45?gi=e50463b54caf

svjack commented 3 years ago

And i think you should also release the training code of your t5-small, similar with https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45?gi=e50463b54caf

It use triples as input, but you seems use full document, this is the difference between controllable and non-controllable generation.

svjack commented 3 years ago

And i think you should also release the training code of your t5-small, similar with https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45?gi=e50463b54caf

It use triples as input, but you seems use full document, this is the difference between controllable and non-controllable generation.

With the help of some entity relation extraction toolkit, can make your case into controllable

svjack commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

One of model you can use handwriting query and query generated mixture is GAN And GAN may be a general solution to make the benchmark to be dynamic with the help of QGenModel

svjack commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

One of model you can use handwriting query and query generated mixture is GAN And GAN may be a general solution to make the benchmark to be dynamic with the help of QGenModel

And if you can train the Dense-Encoder in a GAN-Bert manner, that will merge the QGenModel and QFilter into one GAN construction, where QGenModel to the generator and QFilter to the discriminator

svjack commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

One of model you can use handwriting query and query generated mixture is GAN And GAN may be a general solution to make the benchmark to be dynamic with the help of QGenModel

And if you can train the Dense-Encoder in a GAN-Bert manner, that will merge the QGenModel and QFilter into one GAN construction, where QGenModel to the generator and QFilter to the discriminator

This will be straightforward if you use CrossEncoder as discriminator (with label class num as 2) and t5-small as generator, and combined them into GAN-Bert construction in https://github.com/crux82/ganbert

thakur-nandan commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

No, we don't use a t5-small model, It was mentioned in the example only as a substitute. Now we have our models up on huggingface, so you can use them (BeIR/query-gen-msmarco-t5-large and BeIR/query-gen-msmarco-t5-base).

We use nucleus-sampling for the generation of questions. We find this to perform well and produce diverse questions. Different number of questions can be generated for a paragraph (let's say 1 or 3), but the optimal combination varies with each dataset. This is also the case with filtration. I must say, we are experimenting with various questions and an optimal threshold value. I would suggest starting by generating 1/3 questions using nucleus sampling and filter using a threshold of either greater than 0.5 or 0.9.

thakur-nandan commented 3 years ago

And i think you should also release the training code of your t5-small, similar with https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45?gi=e50463b54caf

Yes, I plan to release them soon. It's in progress. The BEIR GitHub repo is still under development.

You can follow also here regarding the training of QGen here as well https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation.

thakur-nandan commented 3 years ago

You use a trained question generator (t5-small) in https://github.com/UKPLab/beir/blob/a65531f69adb600799b7cee85f06af8cc607d956/examples/filtration/query_gen_filter_and_train.py to generate questions. When come to real problem, that have a dataset with query document. Dose the query generated by a generator like t5-small (filter by a QFilter )will help to improve final performance to IR model , that the query part is the mixture of handwriting query and query generated ? If so, Can you provide a dataset mixture function (give a good ratio suggestion about generate samples) with the help of the query filter method (i know in your case, you use score threshold) to improve the final performance of IR model ? This will make the benchmark more dynamic.

One of model you can use handwriting query and query generated mixture is GAN And GAN may be a general solution to make the benchmark to be dynamic with the help of QGenModel

I am not familiar with GAN architecture, I will need to read once myself. Our motivation was to keep each module separate (Generation, Filtration, and Retrieval). Hence we chose this architecture, to keep it simple and easy to use. If the performance of GANBERT is dominant over existent methods and code is easy to use and understand, I could plan to add it to the repo. Thanks for the pointer.

thakur-nandan commented 3 years ago

Closing the Issue due to no recent activity!