szha commented 3 years ago

Description

remove num_samples arguments in generation calculate_metrics. the number of samples can be inferred from the sample file.

Checklist

Essentials

[x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[x] Changes are complete (i.e. I finished coding on this PR)
[x] All changes have test coverage
[x] Code is well-documented

Changes

[x] remove num_samples/num_bleu_samples from scripts/generation/calculate_metrics.py

Comments

I removed num_bleu_samples together with num_samples, as it's unclear to me whether subsampling is needed when calculating self BLEU. Feel free to comment or suggest otherwise.

cc @dmlc/gluon-nlp-team

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1514/e8760c80a5a9779d42259cda048a6f17e484cc35/index.html

codecov[bot] commented 3 years ago

Codecov Report

Merging #1514 (d5ece78) into master (302865c) will decrease coverage by 0.64%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1514      +/-   ##
==========================================
- Coverage   86.53%   85.88%   -0.65%     
==========================================
  Files          55       55              
  Lines        7513     7396     -117     
==========================================
- Hits         6501     6352     -149     
- Misses       1012     1044      +32

Impacted Files	Coverage Δ
conftest.py	`76.31% <0.00%> (-9.94%)`	:arrow_down:
src/gluonnlp/data/loading.py	`75.75% <0.00%> (-7.64%)`	:arrow_down:
src/gluonnlp/utils/lazy_imports.py	`58.42% <0.00%> (-2.25%)`	:arrow_down:
src/gluonnlp/utils/misc.py	`54.86% <0.00%> (-1.50%)`	:arrow_down:
src/gluonnlp/data/tokenizers/spacy.py	`65.33% <0.00%> (-0.91%)`	:arrow_down:
src/gluonnlp/data/tokenizers/huggingface.py	`71.34% <0.00%> (-0.49%)`	:arrow_down:
src/gluonnlp/data/tokenizers/jieba.py	`73.13% <0.00%> (-0.40%)`	:arrow_down:
src/gluonnlp/models/transformer_xl.py	`80.48% <0.00%> (-0.39%)`	:arrow_down:
src/gluonnlp/models/xlmr.py	`87.87% <0.00%> (-0.36%)`	:arrow_down:
src/gluonnlp/models/electra.py	`76.50% <0.00%> (-0.34%)`	:arrow_down:
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 302865c...af455d6. Read the comment docs.

sxjscience commented 3 years ago

I think it's added because a straight-forward implementation of self-bleu has O(N^2) complexity. It will be too slow if we calculate it among lots of samples.

szha commented 3 years ago

@sxjscience thanks for the comment. I suspected that performance was the reason. Still, I think it might be confusing to randomly subsample reference sets which results in variance in the scores from different runs. Should we switch to deterministic samples instead?

sxjscience commented 3 years ago

We may fix the seed then.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1514/d5ece78380c899b7935baf1172152ae6f1b5e4b5/index.html

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1514/af455d65ef0a3945305600e3596742f35e07db6d/index.html

dmlc / gluon-nlp

[SCRIPT] update default arguments and add seed for scripts/generation/calculate_metrics.py #1514

Description

Checklist

Essentials

Changes

Comments

Codecov Report