Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

WIP: Add squad metric #172

Closed benschreiber closed 3 years ago

benschreiber commented 3 years ago

Partially resolves #60 (this PR adds squad v1, but v2 should be easy to add)

pep8speaks commented 3 years ago

Hello @benschreiber! Thanks for updating this PR.

Line 302:25: W503 line break before binary operator

Comment last updated at 2021-06-10 08:51:21 UTC
codecov[bot] commented 3 years ago

Codecov Report

Merging #172 (f2d7607) into master (0913b40) will decrease coverage by 1.56%. The diff coverage is 32.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #172      +/-   ##
==========================================
- Coverage   69.18%   67.62%   -1.57%     
==========================================
  Files          70       71       +1     
  Lines        1467     1535      +68     
==========================================
+ Hits         1015     1038      +23     
- Misses        452      497      +45     
Flag Coverage Δ
unittests 67.62% <32.91%> (-1.57%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...lp/question_answering/datasets/squad/processing.py 8.27% <5.55%> (-0.68%) :arrow_down:
...sk/nlp/question_answering/datasets/squad/metric.py 30.43% <30.43%> (ø)
..._transformers/task/nlp/question_answering/model.py 50.00% <35.00%> (-21.43%) :arrow_down:
...g_transformers/task/nlp/question_answering/data.py 52.27% <55.55%> (+0.92%) :arrow_up:
...task/nlp/question_answering/datasets/squad/data.py 75.00% <66.66%> (-2.78%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0913b40...f2d7607. Read the comment docs.

SeanNaren commented 3 years ago

This is awesome @benschreiber thanks for contributing this :) I can have a go testing once you feel it's ready.

benschreiber commented 3 years ago

Thanks @SeanNaren. Yeah, it's ready to test.

SeanNaren commented 3 years ago

Hey @benschreiber made a small modification to the validation step. I'm noticing for the test it's not collecting enough example ids in the lookup table. I can assist in debugging this but if you get a chance to have a look would appreciate it!

benschreiber commented 3 years ago

Hi @SeanNaren, I'm not sure what you are referring to. I just fixed an issue where metric state was not being reset between validation epochs. This would cause metric computation to fail the second time. This would manifest as an assert where len(predictions[0]) == 21644 instead of 10822.

It's also necessary to launch with trainer.num_sanity_val_steps=-1 or 0 (the entire val dataset or nothing). Omitting this flag fails at the same assert, but with len(predictions[0]) == 32. Maybe we could check how many samples we have seen at the end of the val epoch and skip the metric computation if it's the wrong number.

Are either of these the bug that you saw?

benschreiber commented 3 years ago

Hi @SeanNaren, what's your current status on this?