NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.84k stars 900 forks source link

Run aNMM #654

Closed ctrado18 closed 5 years ago

ctrado18 commented 5 years ago

I am new to MatrchZoo. I wonder how to run aNMM. The docs don't have usage for aNMM. I feel I have to run a script for calculating the bin_sizes for aNMM? But I cannot find where this script lies.

Furthermore, my training data does need to have a format like here: https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/datasets/toy/train.csv

right?

And, where are the batches created? Since you have positive and negative documents for each query, the batch should contain examples with pos and negs samples, right?

How can I load my own data?

Thanks.

yangliuy commented 5 years ago

@ctrado18 Thanks for the questions! For the code for generating the bin sum in the interaction matrix, currently you can refer to https://github.com/NTMC-Community/MatchZoo/blob/1.0/data/WikiQA/run_data.sh and https://github.com/NTMC-Community/MatchZoo/blob/1.0/data/WikiQA/gen_binsum4anmm.py in MatchZoo V1.0. As for MatchZoo V2.0, I am preparing another tutorial to demonstrate how to run aNMM on WikiQA data. You can check back later.

ctrado18 commented 5 years ago

In ranking (input a question output one or two answers) like with aNMM or siamese networks or any other ranking deep learning models where you usee a rank loss, is it just enough to have labels with value 0,1 indicating wether the answer is right or false? How many negative examples I need? I just want to learn relevant and irrelevant documents.

Also, it would be nice to see how to apply pointwise, pairwise and listwise ranking on an example.

yangliuy commented 5 years ago

@ctrado18 The labels could be binary or multiple different levels in integers depends on your setting. For WikiQA, the negative examples in the training data are from human annotations. If you use negative sampling to create the negative training data, you can try different settings for the number of negative examples. But this is not needed for data like WikiQA/TRECQA.

You can refer to the code of point/pair/list generator here Link

ctrado18 commented 5 years ago

Thanks. I just confused about the relationship of the label (0,1) for a contrastive loss which claculates the loss from a score and not the label. So how does my rank model (any one like aNMM in keras) to relate the label with the corresponding loss? If I understand your aNMM paper right you are doing so.

Up to now I have no experience in using rank losses in tenroslfow/keras or in general.

yangliuy commented 5 years ago

@ctrado18 Let's take pointwise ranking loss and pairwise ranking loss as the example. In pointwise ranking loss, you are treating ranking as a special case of classification/regression tasks. Basically you are trying predicting the ground truth label based on the features of a query/doc pair. However, in pairwise ranking loss, you are trying predicting the relative order of a positive doc and a negative doc given a triple <query, pos_doc, neg_doc>. During training stage, you can generate these triples based on document labels like (0,1). So in general pairwise ranking loss has a better correlation with the final ranking metrics compared with pointwise ranking loss. For listwise ranking loss, it tries optimizing the whole list of documents given a query. So the search space is all the permutations of the list of documents. Most neural ranking models like aNMM/DRMM/MatchPyramid/MV-LSTM adopted a pairwise ranking loss.

If you want to learn these technical details systematically, I recommend the survey on neural ranking models by Prof. Jiafeng Guo (https://arxiv.org/abs/1903.06902) and the survey on LTR by Dr. Tie-yan Liu (https://dl.acm.org/citation.cfm?id=1618304) as great reading resources.

ctrado18 commented 5 years ago

Thanks! I think I am going to understand. I read the papers.

For pointwise ranking you could also have just labels 0 and 1? Or do you have then in your training data just the pairs of query and positive document together with one of multiple (>2) labels? And in pairwise loss you use the final score from the final layer to compute the loss, instead of using a label, right?

Do you know any simplified code (or a github repo) in keras/tensorflow using a neural model like DSSM shwoing how to build the batches and using a pairwise ranking loss?

yangliuy commented 5 years ago

@ctrado18 You can refer to the code of point/pair/list generator in MatchZoo 1.0 here https://github.com/NTMC-Community/MatchZoo/tree/1.0/matchzoo/inputs

ctrado18 commented 5 years ago

In v2 is this the same like pairDataGenator? I can use this in v2 for aNMM too, right? Where the difference between v1 and v2?Also in v1 there are no examples showig the general usage?

Are there any small example in v1 or v2 showing to run a model with a pairwise loss?

yangliuy commented 5 years ago

@ctrado18 Currently I suggest you to use the aNMM model in v1. As for v2, I am preparing a tutorial on how to run aNMM. You can check back later. The input data format and model implementations of v1 and v2 are very different. I think you can follow the documentation here https://github.com/NTMC-Community/MatchZoo/tree/1.0 (which is easy to follow) if you want to run aNMM on datasets like WikiQA or your own datasets.

ctrado18 commented 5 years ago

Thanks I will try it with with my own Data!

In pointwise ranking loss, you are treating ranking as a special case of classification/regression tasks. Basically you are trying predicting the ground truth label based on the features of a query/doc pair.

Could this be also binary labels with 0,1? Then you would also add negative examples for each query? Then for me it is difficult to discrimante both point and pairwise since also pointwise would learn to discriminate between negative and positve label? Maybe I am on the wrong rote? :-)

yangliuy commented 5 years ago

@ctrado18 Yes, it can be binary labels. Think of loss functions like MSE or Cross Entropy. You don't have to sample negative examples for each query. All you need to do is to build batches of training instances according to <q,d> for pointwise ranking or <q, d+, d-> for pairwise ranking to train your model.

ctrado18 commented 5 years ago

I assume that also I can try first aNMM or DSSM with a pointwise loss? Is this possible with the hinge loss?

ctrado18 commented 5 years ago

@yangliuy I see it, you can since there is also a classification example. Fine. One question what is the DRMM PairGenerator? Why do I need it for aNMM classification? Is it working with normal PairGenerator too?

For aNMM I need to run this binsum script first to get this hist_file, right? And what is the relation file?

Sorry for those trivial questions.

yangliuy commented 5 years ago

@ctrado18 You can check the config files under here https://github.com/NTMC-Community/MatchZoo/tree/1.0/examples/wikiqa/config and https://github.com/NTMC-Community/MatchZoo/tree/1.0/examples/QuoraQP/config . For classification, you can refer to the config files of QuoraQP.

Yes, you need to run the binsum script. As for what the relation file is, please find it in the readme file. I suggest you to spend some time reading the readme file and docs, running the examples for WikiQA and Quora, diving in the code before you ask new questions. Many answers are already in the readme file and examples.

datistiquo commented 5 years ago

I am almost done. But I see that you manually have to give the vocab size? That is a bit hard to find out everytime. Am I right that this is the size of the word_dict?

yangliuy commented 5 years ago

@datistiquo Yes, you're right.

datistiquo commented 5 years ago

@yangliuy for v2 there is no anmm preprocessor so I wanted to create one. But it seems that there is no callback or preprocessing for genenating the bin sums?

yangliuy commented 5 years ago

@datistiquo Yes, the preprocessing for generating the bin sums in v1 hasn't been added into v2 yet. You are welcome to make contributions on this part!

datistiquo commented 5 years ago

@yangliuy would I do this with preprocessor or with a callback instead like calculating the mathcing histograms via callback? What way would be suitable for anmm? Is that all to take care for anmm?

yangliuy commented 5 years ago

I think it should be via callback. But I also need to double verify this. Maybe @faneshion can give you better answers on this.

yangliuy commented 5 years ago

@datistiquo You can refer to the function for histogram computation here: https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/preprocessors/units/matching_histogram.py and https://github.com/NTMC-Community/MatchZoo/blob/master/matchzoo/data_generator/callbacks/histogram.py . This solution may introduce duplicated computation of matching histograms for the same <q,d> pair. You can also try to add it into the preprocessors, which pre-compute the matching histograms as caches to be more efficient.

datistiquo commented 5 years ago

@yangliuy Isn't it the matching histogram scheme the same as needed for anmm? So you can use it as it is. But I don't know if this will work as the input for DRMM and ANMM (query plus matching matrix) is different?

yangliuy commented 5 years ago

@datistiquo You can try to run experiments to test this. The main difference is that ANMM will compute the sum of weights in each bin and aggregate them via the feedforward networks and question term attention networks.

datistiquo commented 5 years ago

@yangliuy I meant it in the way implementing it in the matchzoo framework. Caluclating bin sum is like calculating the histogram. So procedure is doing like DRRM? I just modify the matching histogram to calculate the sums and would train like DRRM just with ANMM model. I just ask if the output of this callback is suited for ANMM since DRRM just want the matching histogram but ANMM want also the query plus the bin sum matrix?! 😄

yangliuy commented 5 years ago

Yes, you can modify the matching histogram to calculate the sums and train ANMM similar to DRMM. Both ANMM and DRMM need queries as the input. The differences are in the construction process of matching histogram and bim sum matrix. You need to run and debug the code to verify whether the output of this callback function is suitable.

uduse commented 5 years ago

I hope things are working well for you now. I’ll go ahead and close this issue, but I’m happy to continue further discussion whenever needed.