SapienzaNLP / reducing-wsd-bias-in-nmt

Other
0 stars 1 forks source link

No file `best.ckpt` after pretraining. #4

Closed erip closed 1 year ago

erip commented 1 year ago

After running the pretraining command in the README, I find that there's no best.ckpt file in reducing-wsd-bias-in-nmt/experiments/en-de/wmt/baseline/.

$ find experiments -name "*.ckpt"
$

It seems like it should be saved by default, but it doesn't seem to be. Does this require --checkpoint_callback True?

Valahaar commented 1 year ago

Well, that is weird. Yes, it should be saved by default, as the callback is included all the time (lines 220-236 of main.py). But, to be fair, it is just a convenience symlink to the best available model, as defined in src/utils/callbacks.py#L105, so you could probably just symlink it manually.

P.s.: I just noticed you don't have any checkpoints saved at all, not just best.ckpt. What command are you running?

erip commented 1 year ago

I was running exactly what was in the README:

PYTHONPATH=src python src/main.py \ 
  --src en \ 
  --tgt de \ 
  --experiment baseline \ 
  --virtual-batch-size 768 \ 
  --batch-size 4 \ 
  --gpus 1 \ 
  --no-pretraining \ 
  --lr 5e-4 \ 
  --corpus-name wmt \ 
  --max_steps 100000 \ 
  --warmup-steps 4000 \ 
  --smoothing 0.1 \ 
  --seed 1337 \ 
  --dataset-variety all \ 
  --encoder-type basic

It completed and the only new files in the experiment/ dir were wandb files, but no checkpoints (i.e., neither ".pt" nor ".ckpt").

Valahaar commented 1 year ago

Does it look like the command run successfully? Can you send the logs from the terminal? Were there any warnings? I've run it and it worked on my end...

On Thu, Jan 12, 2023, 18:19 Elijah Rippeth @.***> wrote:

I was running exactly what was in the README:

PYTHONPATH=src python src/main.py \ --src en \ --tgt de \ --experiment baseline \ --virtual-batch-size 768 \ --batch-size 4 \ --gpus 1 \ --no-pretraining \ --lr 5e-4 \ --corpus-name wmt \ --max_steps 100000 \ --warmup-steps 4000 \ --smoothing 0.1 \ --seed 1337 \ --dataset-variety all \ --encoder-type basic

It completed and the only new files in the experiment/ dir were wandb files, but no checkpoints (i.e., neither ".pt" nor ".ckpt").

— Reply to this email directly, view it on GitHub https://github.com/SapienzaNLP/reducing-wsd-bias-in-nmt/issues/4#issuecomment-1380750158, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEUS5G5KBPZ3BDB3EALS34LWSA4LNANCNFSM6AAAAAATZKRKT4 . You are receiving this because you commented.Message ID: @.***>

erip commented 1 year ago

Yes, it completed training. There were warnings related to version mismatches, but mostly about deprecations in dependencies (e.g., transformers) and nothing that would suggest that checkpoints wouldn't be written. I'll revert to the exact versions in the requirements.txt (I cannot recall why exactly they don't match, but I think it was about a broken API in a dep... edit: numpy 1.20.x isn't available for python 3.10 which I'm using and wheel building is failing on my cluster 😓 ) Unfortunately I removed the logs, but there was nothing particularly suspicious in them when I checked.

Out of curiosity, is your fine-tuned SCR model available somewhere? I'm trying to run it on a different test set, so perhaps the pretraining and fine-tuning isn't necessary in the first place.

erip commented 1 year ago

In any case, I'll use py3.8 as recommended in the README and try again.

erip commented 1 year ago

It seems to be working now, so I'll close this but I would still be interested in playing with your trained models if they're available. :-)

Valahaar commented 1 year ago

Glad to know it's working now!

As for the trained models, we didn't release them as they were not meant as a core contribution of the paper, as opposed to the data and code to perform SCR & the analyses we reported in the paper itself.

In any case, I'll have to speak with my supervisor to release them, and he'll be busy until the ACL deadline (so until next week) :sweat_smile: