google-research / bleurt

BLEURT is a metric for Natural Language Generation based on transfer learning.
https://arxiv.org/abs/2004.04696
Apache License 2.0
697 stars 85 forks source link

Is text truncation to 512 tokens handled automatically for both candidate and reference texts? #30

Closed wingedRuslan closed 3 years ago

wingedRuslan commented 3 years ago

Hi,

I wanted to clarify the following information. On the checkpoints page here, you mention that

Currently, the following six BLEURT checkpoints are available, fine-tuned on WMT Metrics ratings data from 2015 to 2018. They vary on two aspects: the size of the model, and the size of the input.

Let's say I am using the following model - BLEURT-Base, 512 (max #tokens). In my case, both generated text and reference text are longer than 512 tokens. While computing the BLEURT, will it automatically truncate both generated text and reference text to fit the requirement and then calculate the score between truncated versions of generated text and reference text? Or do I need to cut the length of generated text and reference text manually before calling the function to calculate BLEURT?

Many thanks in advance, Ruslan

tsellam commented 3 years ago

Hi Ruslan, Thanks for reaching out. It's option 1: BLEURT does the truncation for you (see encoding.py).