Open jerinphilip opened 3 years ago
```
[2021-02-02 11:37:57] Error: Required option 'use-legacy-batching' has not been set
[2021-02-02 11:37:57] Error: Aborted from T marian::Options::get(const char*) const [with T = bool] in /var/lib/jenkins/workspace/browsermt-marian-dev-cuda-10.2/src/common/options.h:134
[CALL STACK]
[0x6ffd1e] bool marian::Options:: get
@XapaJIaMnu (on slack): So this used to be the case that there are two wasy to do CBLAS_SGEMM with MKL. for the attention layer. Through a call of CBLAS_SGAMM_BATCHED or through a for loop with multiple CBLAS_SGEMM calls. Now since this project will use DNNL, the only available codepath is the the multiple CBLAS_SGEMM calls. During one of the merges with master, this option got added and removed by upstream so i assume that's where it got messed up
/var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit.sh.log
+ python3 /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/sacrebleu/sacrebleu.py newstest2018.ref
+ tee intgemm_16bit.out.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 30.5 66.3/38.6/24.4/15.8 (BP = 0.968 ratio = 0.968 hyp_len = 2748 ref_len = 2838)
+ cat intgemm_16bit.avx.expected.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 30.5 66.3/38.6/24.5/15.8 (BP = 0.967 ratio = 0.967 hyp_len = 2745 ref_len = 2838)
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh intgemm_16bit.out intgemm_16bit.avx.expected
Command: /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_16bit.out /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_16bit.avx.expected
14c14
< Ago Leis, head of the Central Criminal Police Service, said the arrests were preceded by a probe into a year-and-a-half year-and-a-half investigation.
---
> Ago Leis, head of the Central Criminal Police Service, said the arrests were preceded by a year-and-a-half probe.
28c28
< For example, the latest court rulings, eight defendants separated from the so-called Dikayev Criminal Association criminal case who were ordered to pay BGN 80,000 for the proceeds of criminal damage, or the judgment of nine individuals, in 2006 that Igor Aleynikov established a criminal association aimed at the illegal trade in cigarettes and the committing of crimes related to human trafficking in East Virginia and the South in Estonia.
---
> For example, the latest court rulings, eight defendants separated from the so-called Dikayev Criminal Association criminal case, who were ordered to pay BGN 80,000 for the proceeds of criminal damage, or the judgment of nine individuals, in 2006 that Igor Aleynikov established a criminal association aimed at the illegal trade in cigarettes and the committing of crimes related to human trafficking in East Virginia and the South in Estonia.
Why is this failing?
/var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit_sse2.sh.log
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/marian-dev/build/marian-conv -f /var/lib/jenkins/workspace/browsermt-marian-regression-tests/models/student-eten/model.npz -t intgemm_16bit_sse2.avx.bin --gemm-type intgemm16sse2
[2021-02-02 11:54:06] Error: Unknown gemm-type: intgemm16sse2
[2021-02-02 11:54:06] Error: Aborted from int main(int, char**) in /var/lib/jenkins/workspace/browsermt-marian-dev-cuda-10.2/src/command/marian_conv.cpp:54
[CALL STACK]
[0x57b0b2] main + 0x1762
[0x7f8d8446a840] __libc_start_main + 0xf0
[0x59e8f9] _start + 0x29
test_intgemm_16bit_sse2.sh: line 37: 27191 Aborted (core dumped) $MRT_MARIAN/marian-conv -f $MRT_MODELS/student-eten/model.npz -t $prefix.$suffix.bin --gemm-type intgemm16sse2
This is a named parameter fail.
/var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit.sh.log
+ python3 /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/sacrebleu/sacrebleu.py newstest2018.ref
+ tee intgemm_8bit.out.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 29.6 65.4/38.0/23.8/14.9 (BP = 0.966 ratio = 0.966 hyp_len = 2742 ref_len = 2838)
+ cat intgemm_8bit.avx.expected.bleu
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.2.12 = 29.8 65.5/38.1/24.1/15.0 (BP = 0.968 ratio = 0.969 hyp_len = 2749 ref_len = 2838)
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh intgemm_8bit.out intgemm_8bit.avx.expected
Command: /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff.sh /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_8bit.out /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/intgemm_8bit.avx.expected
Outputs are very different. 98 lines differ. Probably some gemm switch/feature to be enabled as a fix?
/var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh.log
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/marian-dev/build/marian-conv -f /var/lib/jenkins/workspace/browsermt-marian-regression-tests/models/student-eten/model.npz -t intgemm_8bit_ssse3.avx.bin --gemm-type intgemm8ssse3
[2021-02-02 11:54:15] Error: Unknown gemm-type: intgemm8ssse3
[2021-02-02 11:54:15] Error: Aborted from int main(int, char**) in /var/lib/jenkins/workspace/browsermt-marian-dev-cuda-10.2/src/command/marian_conv.cpp:54
[CALL STACK]
[0x57b0b2] main + 0x1762
[0x7f8e1417c840] __libc_start_main + 0xf0
[0x59e8f9] _start + 0x29
test_intgemm_8bit_ssse3.sh: line 37: 27310 Aborted (core dumped) $MRT_MARIAN/marian-conv -f $MRT_MODELS/student-eten/model.npz -t $prefix.$suffix.bin --gemm-type intgemm8ssse3
Another parameter fail.
/var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/test_student_small_aan_intgemm16.sh.log
+ cat optimize_aan_16.out
+ perl -pe 's/@@ //g'
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/moses-scripts/scripts/recaser/detruecase.perl
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/extract-bleu.sh
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/moses-scripts/scripts/generic/multi-bleu.perl newstest2014.ref
It is in-advisable to publish scores from multi-bleu.perl. The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups. Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization. Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.
+ /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff-nums.py optimize_aan_16.bleu optimize_aan.bleu.expected -p 0.6 -o optimize_aan_16.bleu.diff
Command: /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tools/diff-nums.py /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/optimize_aan_16.bleu /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/optimize_aan.bleu.expected -p 0.6 -o /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/optimize_aan_16.bleu.diff
Line 1: 25.09 != 25.78
Regression tests will incompatible with upstream, they use a toned down feature level intgemm (THey don't pass the output layer through intgemm, we do). As such you can't get the same numbers as upstream tests, even if you match the architecture.
Some upstream gemm configurations are not available here. We use an architecture agnostic binary format, upstream has both architecture dependent and architecture agnostic.
@kpu told me to compile what's happening, it's being done in this issue. What is a recommended fix so we can get rid of the build failure on all browsermt/* updates while keeping them separate?
We can afford to keep separate regression tests if that's what it takes. I'm fairly certain I'm lacking enough context to get to the bottom of these test failures.
Sooo basically, you need to rerun the test sets on the different machines (sse, avx2, avx512, avx512vnni), create gold standard references for those and then replace the old reference with those
Sooo basically, you need to rerun the test sets on the different machines (sse, avx2, avx512, avx512vnni), create gold standard references for those and then replace the old reference with those
That sounds easy for places with diffs in expected vs outputs, something which I can do along setting up along with bergamot-translator tests.
What of the remaining command/argument failures? (1, 3, and 5)
Legacy batching, needs to be merged and fixed. Can you try the branch that I have proposed? the nonexistent intgemm options can be removed
@XapaJIaMnu I tested the change, it's working. Didn't have to change tests, so --use-legacy-batching
is default on?
Technically the results between the legacy and non legacy batching should be exactly the same. Since we are using dnnl, we only have the legacy code path available
On Sat, 6 Feb 2021, 22:51 Jerin Philip, notifications@github.com wrote:
@XapaJIaMnu https://github.com/XapaJIaMnu I tested the change, it's working. Didn't have to change tests, so --use-legacy-batching is default on?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/browsermt/marian-dev/issues/17#issuecomment-774555264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPO5VMA3HX3SJDEEYKOVW3S5XBVJANCNFSM4XBKLDPQ .
Current status on lofn:
Skipped:
- tests/decoder/align-ensemble/test_align_ensemble.sh
- tests/decoder/align-ensemble/test_align_ensemble_beam_1.sh
- tests/decoder/intgemm/test_intgemm_16bit_avx2.sh
- tests/decoder/intgemm/test_intgemm_8bit_avx2.sh
- tests/decoder/shortlist/test_shortlist_server.sh
- tests/examples/iris/test_iris.sh
- tests/examples/mnist/test_mnist_ffnn.sh
- tests/interface/input-tsv/test_tsv_server.sh
- tests/interface/input-tsv/test_tsv_server_dual_source.sh
- tests/models/wngt19/test_model_base_fbgemm_packed16.sh
- tests/models/wngt19/test_model_base_fbgemm_packed8.sh
- tests/server/test_ende.sh
- tests/server/test_ende_align.sh
- tests/server/test_ende_batch32.sh
- tests/server/test_ende_cpu.sh
- tests/server/test_ende_with_empty_lines.sh
- tests/training/features/exp-smoothing/test_expsmooth_sync.sh
- tests/training/multi-gpu/test_async_sgd_runs.sh
- tests/training/multi-gpu/test_sync_sgd.sh
- tests/training/restoring/exp-smoothing/test_expsmooth_sync.sh
- tests/training/restoring/multi-gpu/test_adam_sync.sh
- tests/training/restoring/multi-gpu/test_async.sh
- tests/training/restoring/multi-gpu/test_sync.sh
- tests/training/restoring/optimizer/test_adam_params_async.sh
- tests/training/restoring/optimizer/test_adam_params_sync.sh
Failed:
- tests/decoder/align/test_align.sh
- tests/decoder/align/test_align_beam_1.sh
- tests/decoder/align/test_align_beam_1_batched.sh
- tests/decoder/align/test_align_cpu.sh
- tests/decoder/align/test_align_nbest.sh
- tests/decoder/align/test_align_threshold.sh
- tests/decoder/align/test_soft_align.sh
- tests/decoder/align/test_soft_align_nbest.sh
- tests/decoder/intgemm/test_intgemm_16bit.sh
- tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
- tests/decoder/intgemm/test_intgemm_8bit.sh
- tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
- tests/decoder/wmt16/test_ende.sh
- tests/decoder/wmt16/test_ende_cpu.sh
- tests/decoder/wmt16/test_ende_logs.sh
- tests/decoder/wmt16/test_nbest.sh
- tests/decoder/word-scores/test_word_scores.sh
- tests/decoder/word-scores/test_word_scores_batch.sh
- tests/decoder/word-scores/test_word_scores_ensemble.sh
- tests/decoder/word-scores/test_word_scores_nbest.sh
- tests/decoder/word-scores/test_word_scores_nbest_with_align.sh
- tests/decoder/word-scores/test_word_scores_normalized.sh
- tests/examples/unit-tests/test_unit_tests.sh
- tests/interface/config/test_dump_config_with_relative_paths.sh
- tests/interface/config/test_relative_paths.sh
- tests/interface/config/test_relative_paths_apply_only_to_config_files.sh
- tests/interface/config/test_relative_paths_are_not_applied_to_cmd_options.sh
- tests/interface/config/test_relative_paths_for_each_config_file.sh
- tests/interface/config/test_relative_paths_for_input_in_config_file.sh
- tests/interface/envvars/test_interpolate_envvars.sh
- tests/interface/input/test_empty_file.sh
- tests/interface/version/test_no_version_from_old_models.sh
- tests/models/wmt16-ende/test_translation_b6n.sh
- tests/models/wmt16-ende/test_translation_b6n_batch32.sh
- tests/models/wmt16-ende/test_translation_b6n_batch64.sh
- tests/models/wnmt18/test_student_small_aan_intgemm16.sh
- tests/scorer/align/test_scorer_align.sh
- tests/scorer/align/test_scorer_align_batch_1.sh
- tests/scorer/align/test_scorer_align_nbest.sh
- tests/scorer/align/test_scorer_soft_align.sh
- tests/scorer/nbest/test_compare_parallel_and_nbest.sh
- tests/scorer/nbest/test_custom_feature_name.sh
- tests/scorer/nbest/test_score_nbest_list.sh
- tests/scorer/scores/test_compare_with_decoder_scores.sh
- tests/scorer/scores/test_scores.sh
- tests/scorer/scores/test_scores_cpu.sh
- tests/scorer/scores/test_scores_normalized.sh
- tests/scorer/scores/test_summary.sh
- tests/scorer/scores/test_summary_perplexity.sh
- tests/scorer/scores/test_word_scores.sh
- tests/scorer/scores/test_word_scores_mini_batch_1.sh
- tests/scorer/scores/test_word_scores_nbest.sh
- tests/scorer/scores/test_word_scores_normalized.sh
- tests/training/features/guided-alignment/test_guided_alignment_rnn.sh
- tests/training/features/guided-alignment/test_guided_alignment_transformer.sh
- tests/training/features/guided-alignment/test_guided_alignment_transformer_sync.sh
- tests/training/restarting/test_restarting_finished.sh
---------------------
Ran 82 tests in 00:01:0.497s, 0 passed, 25 skipped, 57 failed
Some appear due to changes in the model archives where files have gone missing.
Status
Logs
**Logs** http://vali.inf.ed.ac.uk/jenkins/job/browsermt-marian-regression-tests/7/console ``` Failed: - tests/scorer/scores/test_scores_cpu.sh - tests/decoder/intgemm/test_intgemm_16bit.sh - tests/decoder/intgemm/test_intgemm_16bit_sse2.sh - tests/decoder/intgemm/test_intgemm_8bit.sh - tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh - tests/models/wnmt18/test_student_small_aan_intgemm16.sh Logs: - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/scorer/scores/test_scores_cpu.sh.log - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit.sh.log - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_16bit_sse2.sh.log - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit.sh.log - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh.log - /var/lib/jenkins/workspace/browsermt-marian-regression-tests/tests/models/wnmt18/test_student_small_aan_intgemm16.sh.log ```
Issue updated as I figure what exactly is failing.
Available Machines, vector instructions
``` ansible -m shell -a "grep -o -e 'avx[^ ]*' -e 'sse[^ ]*' -e ssse3 /proc/cpuinfo | sort | uniq | tr '\n' ' '" gpu --limit '!fulla' dagr | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 elli | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 baldur | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 bil | CHANGED | rc=0 >> avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 buri | CHANGED | rc=0 >> sse sse2 sse4_1 sse4_2 ssse3 hodor | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 frigg | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 hretha | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 gna | CHANGED | rc=0 >> avx sse sse2 sse4_1 sse4_2 ssse3 lofn | CHANGED | rc=0 >> avx sse sse2 sse4_1 sse4_2 ssse3 mani | CHANGED | rc=0 >> avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 mimir | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 meili | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 rindr | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 sigyn | CHANGED | rc=0 >> avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 startiger | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 vor | CHANGED | rc=0 >> avx avx2 avx512cd avx512f sse sse2 sse4_1 sse4_2 ssse3 snotra | CHANGED | rc=0 >> avx sse sse2 sse4_1 sse4_2 ssse3 thrud | CHANGED | rc=0 >> avx sse sse2 sse4_1 sse4_2 ssse3 zisa | CHANGED | rc=0 >> avx avx2 sse sse2 sse4_1 sse4_2 ssse3 ```