Open BramVanroy opened 2 years ago
To increase my worry, I indeed find that these hyperparameters are not leading to successful training. Below you find the current training loss and dev smatch respectively.
While the config aims to train for 30 epochs, it is clear it already starts to deteriorate after less than one epoch. So if you have the exact hyperparameters that were used in finetuning text2amr, I would be grateful.
Hi, I am not sure what is happening here. We have retrained SPRING several times and we have always reproduced the same results, up to some variation due to the random seed. We have used the same exact config you are using. Would you mind giving us some additional context? Your run seems to be diverging, would you mind pasting a plot of the training/dev loss? What's the training data? Setup? I have seen runs diverging irrecoverably, but tbh it was very rare, and usually restoring a checkpoint or setting a different seed fixed everything.
The beam size of 5 was used only outside of the training run for efficiency (BS is very slow). This was kind of pointed at in Table 1 of the Appendix, but I can see how it can be confusing. Warmup we don't care about because the scheduler is constant. In my experience warmup is not crucial with pre-trained BART. If there are other discrepancies, the config is the final truth.
(@rexhinab comments?)
The training was exactly as described in this repository with the AMR3.0 dataset. I have put train loss graphs and dev smatch scores in the comment above. Perhaps it was a bad seed. I will try again if I find the time/compute and keep you posted.
For more context, here's what the training curves look like for us.
@mbevila I tried again with a fresh install of Python and Spring on our cluster and unfortunately I still cannot reproduce your results. Again, after some 10k steps, training loss already starts to increase. It would seem that the learning rate is too high - or that a constant LR does not work well.
Config (same as the one in this repo except for log_wandb: True
and the paths):
name: baseline+smart_init
model: facebook/bart-large
# <--------------
# Linearizations
# Comment DFS and uncomment the relevant block if you want to use a different linearization scheme
# DFS
penman_linearization: True
use_pointer_tokens: True
raw_graph: False
# BFS
# penman_linearization: False
# use_pointer_tokens: True
# raw_graph: False
# PENMAN
# penman_linearization: True
# use_pointer_tokens: False
# raw_graph: False
# BART baseline
# penman_linearization: True
# use_pointer_tokens: False
# raw_graph: True
remove_wiki: False
dereify: False
collapse_name_ops: False
# Hparams
batch_size: 500
beam_size: 1
dropout: 0.25
attention_dropout: 0.0
smart_init: True
accum_steps: 10
warmup_steps: 1
training_steps: 250000
weight_decay: 0.004
grad_norm: 2.5
scheduler: constant
learning_rate: 0.00005
max_epochs: 30
save_checkpoints: True
log_wandb: True
warm_start: True
use_recategorization: False
best_loss: False
remove_longer_than: 1024
# <------------------
# Data: replace DATA below with the root of your AMR 2/3 release folder
train: multilingual-text-to-amr/data/amr_annotation_3.0/data/amrs/split/training/*.txt
dev: multilingual-text-to-amr/data/amr_annotation_3.0/data/amrs/split/dev/*.txt
test: multilingual-text-to-amr/data/amr_annotation_3.0/data/amrs/split/test/*.txt
Environment:
Bottleneck @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/Bottleneck/Bottleneck-1.3.2
cached-property==1.5.2
certifi==2022.9.24
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
deap @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/deap/deap-1.3.1
docker-pycreds==0.4.0
expecttest @ file:///tmp/vsc40003/easybuild/expecttest/0.1.3/GCCcore-10.3.0/expecttest-0.1.3
filelock==3.8.0
gitdb==4.0.9
GitPython==3.1.29
grpcio==1.50.0
grpcio-tools==1.48.1
idna==3.4
joblib==1.2.0
lxml==4.9.1
mpi4py @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/mpi4py/mpi4py-3.0.3
mpmath @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/mpmath/mpmath-1.2.1
networkx==2.8.7
numexpr @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/numexpr/numexpr-2.7.3
numpy @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/numpy/numpy-1.20.3
packaging==21.3
pandas @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/pandas/pandas-1.2.4
pathtools==0.1.2
Penman==1.2.2
Pillow @ file:///tmp/vsc40003/easybuild/Pillow/8.2.0/GCCcore-10.3.0/Pillow-8.2.0
portalocker==2.6.0
promise==2.3
protobuf @ file:///tmp/vsc40003/easybuild/protobufpython/3.17.3/GCCcore-10.3.0/protobuf-3.17.3
psutil==5.9.3
pybind11 @ file:///tmp/vsc40003/easybuild/pybind11/2.6.2/GCCcore-10.3.0/pybind11-2.6.2
pyparsing==3.0.9
pytorch-ignite==0.4.10
PyYAML @ file:///tmp/vsc40003/easybuild/PyYAML/5.4.1/GCCcore-10.3.0/PyYAML-5.4.1
regex==2022.9.13
requests==2.28.1
sacrebleu==2.3.1
sacremoses==0.0.53
scipy @ file:///tmp/vsc40003/easybuild/SciPybundle/2021.05/foss-2021a/scipy/scipy-1.6.3
sentencepiece==0.1.97
sentry-sdk==1.9.10
setproctitle==1.3.2
shortuuid==1.0.9
six==1.16.0
smatch==1.0.4
smmap==5.0.0
-e git+https://github.com/BramVanroy/spring.git@386780d73b6e7033e55204b65c161b187165bdd0#egg=spring_amr
tabulate==0.9.0
tokenizers @ file:///dodrio/scratch/projects/starting_2022_051/spring/tokenizers-0.7.0-cp39-cp39-linux_x86_64.whl
torch==1.10.0
tqdm==4.64.1
transformers==2.11.0
typing-extensions @ file:///tmp/vsc40003/easybuild/typingextensions/3.10.0.0/GCCcore-10.3.0/typing_extensions-3.10.0.0
urllib3==1.26.12
wandb==0.13.2
Environment:
- `transformers` version: 2.11.0
- Platform: Linux-4.18.0-305.40.2.el8_4.x86_64-x86_64-with-glibc2.28
- Python version: 3.9.5
- PyTorch version (GPU?): 1.10.0 (True)
Thank you for open-sourcing your repo! I am trying to reproduce your results but found difficulty reaching the same scores. I then found that the hyperparameters in the config are not the same as discussed in the paper's appendix. Specifically, you mention a beam search of 5 in the paper but the config has
1
. Could you please clarify? Which of these is correct?I also find that there is a warmup_steps of
1
which seems out-of-place and a very uncommon value. Can you confirm that this is indeed correct?