GEM-benchmark / GEM-metrics

Automatic metrics for GEM tasks
https://gem-benchmark.com
MIT License
60 stars 20 forks source link

Colab Predictions() and References() instances initialization problem #99

Closed asnota closed 2 years ago

asnota commented 2 years ago

I'm trying to run a lib in a Colab environment, following the instructions from the ReadMe:


list_of_predictions = ["The apple is tasty"]
list_of_references = ["The apple was fresh and tasty"]

import gem_metrics

preds = gem_metrics.texts.Predictions(list_of_predictions)
refs = gem_metrics.texts.References(list_of_references)

result = gem_metrics.compute(preds, refs, metrics_list=['bleu', 'rouge']) 

However I'm getting this instead of the metrics output:

[I 220721 10:04:11 __init__:170] Computing BLEU for None... [I 220721 10:04:11 __init__:170] Computing ROUGE for None..

Which is preceeded by "None" messages while initialization of the preds and refs instances: [I 220721 10:04:09 texts:54] Loading predictions for None [I 220721 10:04:09 texts:54] Loading references for None

tuetschek commented 2 years ago

Hi @asnota , sorry that it's confusing, I guess we need to change the printout... but this is normal behavior when you load the data from memory (otherwise a file name would be shown instead of None). So in theory, you should get the metrics scores inside result even after these confusing printouts – is result empty or do you get any other errors?

asnota commented 2 years ago

Hi @tuetschek, you are right, the result object does contain the information. I understand that "N" in the output stands for a number of sentences, but I guess more descriptive name would be helpful.

I also tried to use multiple references, but it doesn't seem to work:

[I 220721 13:54:37 __init__:144] Computing NGramStats for None...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-64-e065bba51891>](https://localhost:8080/#) in <module>()
----> 1 result = gem_metrics.compute(preds, refs, metrics_list=['ngrams'])  # add list of desired metrics here

[/usr/local/lib/python3.7/dist-packages/gem_metrics/__init__.py](https://localhost:8080/#) in compute(outs, refs, srcs, metrics_dict, metrics_list, cache, dataset_name)
    155         if len(refs) != len(outs):
    156             raise ValueError(
--> 157                 f'Incorrect length for data "{outs.filename}" -- outputs: {len(outs)} vs. references: {len(refs)}'
    158             )
    159         values["references_file"] = refs.filename

ValueError: Incorrect length for data "None" -- outputs: 1 vs. references: 2
tuetschek commented 2 years ago

@asnota I'm not sure I can change "N" at this point, but maybe I should add more documentation. Would you be able to share the multi-ref data you used or some case where it fails?

asnota commented 2 years ago

@tuetschek the multi-ref data is just a list of 2 sentences:

import gem_metrics

list_of_predictions = ["The apple is tasty"]
list_of_references = ["The apple was fresh and tasty", "The fruit is an apple"]

preds = gem_metrics.texts.Predictions(list_of_predictions)
refs = gem_metrics.texts.References(list_of_references)

result = gem_metrics.compute(preds, refs, metrics_list=['ngrams'])  # add list of desired metrics here
tuetschek commented 2 years ago

@asnota Thanks! And apologies for the delay. There were actually two problems.

1) If you want to supply multiple references, they need to be passed as lists of lists:

list_of_references = [["The apple was fresh and tasty", "The fruit is an apple"]]

Note the double square brackets -- there has to be just 1 data instance that has 2 references, otherwise we wouldn't be able to tell which reference belongs to which data instance.

2) There was a small bug that I just addressed in #100.

Please update & have a look if it works for you now :-) .

asnota commented 2 years ago

@tuetschek thanks, it works for me too now.