Unbabel / COMET

A Neural Framework for MT Evaluation
https://unbabel.github.io/COMET/html/index.html
Apache License 2.0
493 stars 76 forks source link

[QUESTION]When I train my COMET model, I have the following problem when I am almost successful, it seems to be stuck #152

Closed Winsome-A closed 1 year ago

Winsome-A commented 1 year ago

❓ Questions and Help

When I train my COMET model, I have the following problem when I am almost successful, it seems to be stuck Here is my training command: CUDA_VISIBLE_DEVICES=0 comet-train --cfg /home/xusongcheng/COMET-master/configs/models/referenceless_model.yaml This is the last part on Xshell after my command and it's stuck here! LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | encoder | XLMREncoder | 558 M 1 | layerwise_attention | LayerwiseAttention | 26 2 | train_metrics | RegressionMetrics | 0 3 | val_metrics | ModuleList | 0 4 | estimator | FeedForward | 10.5 M

10.5 M Trainable params 558 M Non-trainable params 569 M Total params 1,138.661 Total estimated model params size (MB) Sanity Checking: 0it [00:00, ?it/s]

c8eb7dca18e1f9bca663377397c4aec

I added “--num_workers 0” to the command is invalid How should I solve it? best wishes Winsome

ricardorei commented 1 year ago

Hi @Winsome-A I have never seen this error maybe you can provide extra information? what pytorch-lightning version are you using?

Winsome-A commented 1 year ago

Oh yeah .Thanks for your email. I have upgraded my cuda&cudnn version and eventually , the problem went away.