NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.82k stars 898 forks source link

lab of documents #796

Closed czt616 closed 3 years ago

czt616 commented 4 years ago

I am using MatchZoo 2.1 to training some models, and I have some questions. When I train the drmm model for document ranking, the label of documents in tutorial is either 0 or 1. I am wondering if I use the cross entropy loss, can I change the value to other value between 0 and 1 to represent different relevance of the document? I added some new documents with a label of 0.75 to wikiqa train set. However, DRMM model still could be trained. Why is this happening?

bwanglzu commented 4 years ago

Let me elaterate a little bit:

In Matchzoo, everything is configurable. Which means models could be used for classification (binary) or ranking (relevance degree), it's deppend on the task type.

To be more concrete, let's see DRMM model, line 88

x_out = self._make_output_layer()(flatten_score)

The output is based on the function called _make_output_layer, it's defined in engine/base_model, line 508

def _make_output_layer(self) -> keras.layers.Layer:
    """:return: a correctly shaped keras dense layer for model output."""
    task = self._params['task']
    if isinstance(task, tasks.Classification):
        # Softmax kernel produce binary output.
        return keras.layers.Dense(task.num_classes, activation='softmax')
    elif isinstance(task, tasks.Ranking):
        # Linear kernel produce relevance degree.
        return keras.layers.Dense(1, activation='linear')
    else:
        raise ValueError(f"{task} is not a valid task type."
                         f"Must be in `Ranking` and `Classification`.")

If you define a ranking task, it will produce the relevance degree, binary for classification. So I guess what you expected is:

ranking_task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss())
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]
# You Initialize a DRMM model
# ..
model.params['task'] = ranking_task
# ...

Back to your questions:

can I change the value to other value between 0 and 1 to represent different relevance of the document?

Yes, create a ranking task and use it as a parameter of DRMM model.

I added some new documents with a label of 0.75 to wikiqa train set. However, DRMM model still could be trained. Why is this happening?

The output of model is depdendent on your task type, not the model itself. The model is more about the architecture.