Closed benschreiber closed 3 years ago
Hello @benschreiber! Thanks for updating this PR.
Line 302:25: W503 line break before binary operator
Merging #172 (f2d7607) into master (0913b40) will decrease coverage by
1.56%
. The diff coverage is32.91%
.
@@ Coverage Diff @@
## master #172 +/- ##
==========================================
- Coverage 69.18% 67.62% -1.57%
==========================================
Files 70 71 +1
Lines 1467 1535 +68
==========================================
+ Hits 1015 1038 +23
- Misses 452 497 +45
Flag | Coverage Δ | |
---|---|---|
unittests | 67.62% <32.91%> (-1.57%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
...lp/question_answering/datasets/squad/processing.py | 8.27% <5.55%> (-0.68%) |
:arrow_down: |
...sk/nlp/question_answering/datasets/squad/metric.py | 30.43% <30.43%> (ø) |
|
..._transformers/task/nlp/question_answering/model.py | 50.00% <35.00%> (-21.43%) |
:arrow_down: |
...g_transformers/task/nlp/question_answering/data.py | 52.27% <55.55%> (+0.92%) |
:arrow_up: |
...task/nlp/question_answering/datasets/squad/data.py | 75.00% <66.66%> (-2.78%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0913b40...f2d7607. Read the comment docs.
This is awesome @benschreiber thanks for contributing this :) I can have a go testing once you feel it's ready.
Thanks @SeanNaren. Yeah, it's ready to test.
Hey @benschreiber made a small modification to the validation step. I'm noticing for the test it's not collecting enough example ids in the lookup table. I can assist in debugging this but if you get a chance to have a look would appreciate it!
Hi @SeanNaren, I'm not sure what you are referring to. I just fixed an issue where metric state was not being reset between validation epochs. This would cause metric computation to fail the second time. This would manifest as an assert where len(predictions[0]) == 21644
instead of 10822.
It's also necessary to launch with trainer.num_sanity_val_steps=-1
or 0
(the entire val dataset or nothing). Omitting this flag fails at the same assert, but with len(predictions[0]) == 32
. Maybe we could check how many samples we have seen at the end of the val epoch and skip the metric computation if it's the wrong number.
Are either of these the bug that you saw?
Hi @SeanNaren, what's your current status on this?
Partially resolves #60 (this PR adds squad v1, but v2 should be easy to add)