Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
453 stars 72 forks source link

Segmentation error when tring to reproduce wmt22 results #136

Open SefaZeng opened 1 year ago

SefaZeng commented 1 year ago

🐛 Bug

To Reproduce

pip install unbabel-comet
comet-score -s ../mt-metrics-eval-v2/wmt22/sources/en-zh.txt -t ../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt -r ../mt-metrics-eval-v2/wmt22/references/en-zh.refA.txt > log.comet
Global seed set to 1
Fetching 5 files: 100%|█████████████████████████| 5/5 [00:00<00:00, 88487.43it/s]
Lightning automatically upgraded your loaded checkpoint from v1.8.3.post1 to v1.9.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../../../../root/.cache/huggingface/hub/models--Unbabel--wmt22-comet-da/snapshots/371e9839ca4e213dde891b066cf3080f75ec7e72/checkpoints/model.ckpt`
Encoder model frozen.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0: 100%|████████████████████| 128/128 [01:00<00:00,  2.10it/s]
[1]    13312 segmentation fault  comet-score -s ../mt-metrics-eval-v2/wmt22/sources/en-zh.txt -t  -r  > 

And the result in log.comet is like:

../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 0   score: 0.8275                                                                                                                                                                    
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 1   score: 0.8833
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 2   score: 0.7753
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 3   score: 0.9103
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 4   score: 0.8103
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 5   score: 0.9792
...
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 2033    score: 0.9494
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 2034    score: 0.9332
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 2035    score: 0.9397
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  Segment 2036    score: 0.9048
../mt-metrics-eval-v2/wmt22/system-outputs/en-zh/HuaweiTSC.txt  score: 0.8622 

Expected behaviour

Output the system score for candidates.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

OS: [e.g. iOS, Linux, Win] Packaging [e.g. pip, conda] pip/py39 Version [e.g. 0.5.2.1] 2.0.1

Additional context

SefaZeng commented 1 year ago

And the results for wm22 en-zh HuaweiTSC.txt do not match that in the wmt22 released COMET-22-refA.sys.score. The score in the file is 0.47647532625047895.