Closed erip closed 1 year ago
Well, that is weird. Yes, it should be saved by default, as the callback is included all the time (lines 220-236 of main.py). But, to be fair, it is just a convenience symlink to the best available model, as defined in src/utils/callbacks.py#L105, so you could probably just symlink it manually.
P.s.: I just noticed you don't have any checkpoints saved at all, not just best.ckpt
. What command are you running?
I was running exactly what was in the README:
PYTHONPATH=src python src/main.py \
--src en \
--tgt de \
--experiment baseline \
--virtual-batch-size 768 \
--batch-size 4 \
--gpus 1 \
--no-pretraining \
--lr 5e-4 \
--corpus-name wmt \
--max_steps 100000 \
--warmup-steps 4000 \
--smoothing 0.1 \
--seed 1337 \
--dataset-variety all \
--encoder-type basic
It completed and the only new files in the experiment/
dir were wandb files, but no checkpoints (i.e., neither ".pt" nor ".ckpt").
Does it look like the command run successfully? Can you send the logs from the terminal? Were there any warnings? I've run it and it worked on my end...
On Thu, Jan 12, 2023, 18:19 Elijah Rippeth @.***> wrote:
I was running exactly what was in the README:
PYTHONPATH=src python src/main.py \ --src en \ --tgt de \ --experiment baseline \ --virtual-batch-size 768 \ --batch-size 4 \ --gpus 1 \ --no-pretraining \ --lr 5e-4 \ --corpus-name wmt \ --max_steps 100000 \ --warmup-steps 4000 \ --smoothing 0.1 \ --seed 1337 \ --dataset-variety all \ --encoder-type basic
It completed and the only new files in the experiment/ dir were wandb files, but no checkpoints (i.e., neither ".pt" nor ".ckpt").
— Reply to this email directly, view it on GitHub https://github.com/SapienzaNLP/reducing-wsd-bias-in-nmt/issues/4#issuecomment-1380750158, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEUS5G5KBPZ3BDB3EALS34LWSA4LNANCNFSM6AAAAAATZKRKT4 . You are receiving this because you commented.Message ID: @.***>
Yes, it completed training. There were warnings related to version mismatches, but mostly about deprecations in dependencies (e.g., transformers) and nothing that would suggest that checkpoints wouldn't be written. I'll revert to the exact versions in the requirements.txt
(I cannot recall why exactly they don't match, but I think it was about a broken API in a dep... edit: numpy 1.20.x isn't available for python 3.10 which I'm using and wheel building is failing on my cluster 😓 ) Unfortunately I removed the logs, but there was nothing particularly suspicious in them when I checked.
Out of curiosity, is your fine-tuned SCR model available somewhere? I'm trying to run it on a different test set, so perhaps the pretraining and fine-tuning isn't necessary in the first place.
In any case, I'll use py3.8 as recommended in the README and try again.
It seems to be working now, so I'll close this but I would still be interested in playing with your trained models if they're available. :-)
Glad to know it's working now!
As for the trained models, we didn't release them as they were not meant as a core contribution of the paper, as opposed to the data and code to perform SCR & the analyses we reported in the paper itself.
In any case, I'll have to speak with my supervisor to release them, and he'll be busy until the ACL deadline (so until next week) :sweat_smile:
After running the pretraining command in the README, I find that there's no
best.ckpt
file inreducing-wsd-bias-in-nmt/experiments/en-de/wmt/baseline/
.It seems like it should be saved by default, but it doesn't seem to be. Does this require
--checkpoint_callback True
?