Evaluate excessively long sequence pair[QUESTION]

Unbabel / COMET

A Neural Framework for MT Evaluation

https://unbabel.github.io/COMET/html/index.html

Apache License 2.0

493 stars 76 forks source link

Evaluate excessively long sequence pair[QUESTION] #105

Closed minghao-wu closed 1 year ago

minghao-wu commented 1 year ago

❓ Questions and Help

Before asking:

Search for similar issues.
Search the docs.

I briefly read though the source code and didn't get the answer.

What is your question?

I attempted to apply COMET to some excessively long document pairs (more than 3000 words for both source and target sequence). I didn't get any error report when doing so. How does COMET process the excessively long sequence?

Code

#### What have you tried? #### What's your environment? - OS: Linux - Packaging: pip - Version: 1.1.3

ricardorei commented 1 year ago

Hi @minghao-wu, COMET truncates the sentence for very long sentences. Hugging face tokenizer already does that for you if you activate truncate=True. You can check the code here

Also, since we are using a triplet encoder architecture this means that both source, MT and reference have an equal capacity of 512 tokens (1536 tokens in total).

In my experience with typical MT testsets this is enough. What is your use case exactly?

minghao-wu commented 1 year ago

Hi @ricardorei ,

Thank you for your answer.

Yes, 512 is typically long enough for a single sentence. However, I am working on DocNMT and trying to use COMET as a document-level measure by concatenating all the source sentences, references and hypotheses within the same document. For example, one of the common benchmark dataset in DocNMT is IWSLT2017 En-De. The document length in the test set is typically longer than 1000 words. Do you have any suggestions for this situation?

ricardorei commented 1 year ago

You can take a look into this work

They extended COMET and other metrics to take into account context.

You still need to score each individual segment within the document but, for each segment, the score will be computed using the previous sentences and more context sensitive.

I want to integrate that into the next release of COMET but for now I think you should take a look at their work.

ricardorei commented 1 year ago

You can look at their paper here: Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric

minghao-wu commented 1 year ago

Thank you for your suggestion. I will have a look.

mtresearcher commented 1 year ago

@ricardorei are you planning to add document level context in COMET sometime soon already?

seeledu commented 1 year ago

same question