GEM-benchmark / GEM-metrics

Automatic metrics for GEM tasks
https://gem-benchmark.com
MIT License
61 stars 20 forks source link

Add local recall metric #16

Closed evanmiltenburg closed 3 years ago

evanmiltenburg commented 3 years ago

This adds the local recall metric from my own work on diversity in NLG output. It's a simplified version that doesn't do POS-based filtering.

evanmiltenburg commented 3 years ago

I didn't test the code yet (except with really basic sanity checks), because the GEM-code doesn't work yet on my laptop (I don't have a GPU).

The main thing that needs to change (other than also including the metric in the GEM code) is that references should also have a lowercase version with no punctuation. So refs would be represented as:

[[['item', 'one', 'reference', 'one'], ['item', 'one', 'reference', 'two']], [['item', 'two', 'reference', 'one'], ['item', 'two', 'reference', 'two']]]

This isn't the case yet, but my code does assume it's already there.

tuetschek commented 3 years ago

@evanmiltenburg you don't need a GPU to run the metrics. In fact, I can't run them on a GPU (or I can, but just for single dataset results, otherwise it breaks, see #12 ). If you install all the packages, it should run just fine on a CPU. If you don't want to wait, comment out BERTScore and BLEURT in gem_metrics/__init__.py.

I've added list_tokenized_lower_nopunct for references (c856b8b440137c623a694210af69143a6873643e), could you try and test on your local?

tuetschek commented 3 years ago

@evanmiltenburg PS: I just merged #17 that disables BLEURT and BERTScore by default, so it should run relatively fast by default (use --heavy-metrics to turn them on).

tuetschek commented 3 years ago

@evanmiltenburg I made an attempt at integrating it, but your code is missing some stuff (just checked using flake8). Please merge from master before attempting any fixes.

tuetschek commented 3 years ago

@evanmiltenburg OK it wasn't technically missing, but you can't use unclassified static methods. Now it works -- see my changes in 6b4e7118a88f7789c14b8c94cbc61df73cd314a3.