Closed 14H034160212 closed 2 years ago
Your data needs to have ::snt <sentence>
in the AMR metadata header for each graph. The 10_Collect_AMR_Data.py
script simply copies this, and all the other data, from the original English AMR-3 file to the new file. If you look at that script, the only things it's doing is collating multiple training files into one file (with some ASCII character filtering), and creating a version with the :wiki edges stripped. The :wiki edges are stripped because the model is not trained to produce these. If needed, they are added to the predicted graphs in a post-processing step.
Be sure you are using the config file model_parse_t5.json
to train the parse (sentence to graph) model. The model_generate_t5.json
config is used to train a generate (graph to sentence) model.
Hi Brad, Thanks a lot for your reply. Merry Christma! The dataset I am using is the AMR-3 which is LDC2020T02. Here is the screenshot for the dataset. What I was trying to do is to replicate what this link has trained for the T5 AMR parser using AMR-3 dataset.
What the current AMR metadata header has is something like that # ::tok Establishing Models in Industrial Innovation
. Do you mean I need to replace the ::tok
by ::snt
?
In the released corpus, the data your showing (with 'alignments' in the filename) is in amr_annotation_3.0/data/alignments/split
. The training data that is typically used is in amr_annotation_3.0/data/amrs/split/
and has filenames like amr-release-3.0-amrs-training-bolt.txt
(no 'alignment' in the filename). The standard files have ':snt' in the metadata.
The data above is annotated with token surface alignments (the ~e.32 after the node names) (which is why it has 'tok' instead of 'snt'). I'm not sure what would happen if you try to train with this data by changing the tok field. I think the surface alignments will be stripped during the training linearization process but I believe that when you test with smatch, most of your nodes will fail so you'll get very low scores. I'd recommend using the amrs
directory data. If you don't have access to that then I'd recommend pre-stripping the surface alignments with the penman library.
Thank you so much. When I use the correct version of amrs
directory, the program is working! If I want to replicate the result for that T5 parser. Are there any other things I need to change? Can I use the default hyperparameter from the model_parse_t5.json
? I am current using the default hyperparameter from the model_parse_t5.json
and 1 RTX8000 GPU with 48 GB memory.
Here is the current progress. I saw you got 82 SMATCH score with LDC2020T02
. What is the 82 means? I got the series of number of SMATCH score for the current stage SMATCH -> P: 0.829, R: 0.793, F:0.811
.
I you use the scripts and config exactly as they appear in the project you will get a 0.831 on the Dev set during training and 0.819 with the Test (beams=4) set afterwards. This is the no-wiki corpus. If you add the wiki tags with the postprocessing BLINK scripts in the training directory you'll get 0.818 smatch (I generally don't bother to add the wiki tags because it's a pain to setup and doesn't change the overall score much).
If you're interested in scores, I'd recommend trying t5-large. That model is too big to train on 12GB GPUs but with a 48GB you won't have any issues. All you have to do is change the config file 'model_name_or_path' from t5-base
to t5-large
.
Thanks a lot! What are the wiki tags? Is there any difference with the no-wiki corpus?
Wiki tags (any edge with :wiki) are links from AMR (named) entities to Wikipedia articles. To add them you need to do an article search for the named-entities on wikipedia. This is very different than parsing so it's common to strip them from the corpus and ignore them.
BTW... I notice your training is fairly slow for that GPU. If you happen to train this again with that 48GB GPU, you should be able to change your "per_gpu_batch_size" to 16 and then drop the "gradient_accumulation_step" to 1. In theory this should train exactly the same as the params in the config (batch size 4, grad accum 4 ==> 16 steps per optimizer batch). You can probably cut training time by half.
Thank you so much! I have finished the training using the config (batch size 4, grad accum 4). Here is the result. I have one more question. How can I save the best dev acc checkpoint? From the saved checkpoints, I did not know which one has the highest dev acc. Also does the 82 SMATCH score from that link means the F1 score?
Here is the saved checkpoints.
The smatch score is printed out in the training log, right above where it saves the checkpoint. I can only see epoch 14-16 on the screen but looks like epoch 15 is slightly higher with a score of 0.830. That's checkpoint-51720. Assuming that's the highest score you can just delete the rest of the checkpoints.
Yes, the smatch score is the F1 score (precision and recall are typically ignored). However note that during training you're scoring on the dev set with a beam size of 1. For whatever reason the test set gets a little lower score, even with a beam of 4. You should get an 0.819 (or maybe 0.818) if you run 22_Test_Model.py. This is the score typically reported.
Hi, Thanks for the feedback. For the training log, I found two different log files namely train_model_parse_t5.log
under amrlib/logs/
and trainer_state.json
under checkpoint-51720/
. But it seems neither of them record the smatch score.
I found in the trainer_state.json
, there are best_metric
and best_model_checkpoint
, but both of them are null
. Perhaps, we can set those two hyperparameters in the config file to save the best dev acc checkpoint?
{
"best_metric": null,
"best_model_checkpoint": null,
Also, I found run_tensorboard.sh
under 31_Model_Parse_T5/
and it seems have runs
folder after training. But I did not find it. Do you know how can I have the runs
folder? Do I need to set some parameters in model_parse_t5.json
for that?
I got 0.821 test F1 score by using the checkpoint from model_parse_t5/checkpoint-51720/
.
Glad it's working for you. Good luck.
Hi,
I got the
KeyError: 'snt'
when I run the20_Train_Model.py
under thescripts/31_Model_Parse_T5/
. I have run the10_Collect_AMR_Data.py
to get the whole training, dev, and test dataset. Also, Does anyone know the difference betweentrain.txt
andtrain.txt.nowiki
? It seems they are quite similar, but in themodel_generate_t5.json
, I only see it usestrain.txt
.Here is
scripts/31_Model_Parse_T5/
, What is the meaning of the number in each file name? For example, why10_Collect_AMR_Data.py
has10
? For my understanding, it means the number of orders I need to execute for training the model like I need to run10_Collect_AMR_Data.py
firstly and then run20_Train_Model.py
and so on. Am I correct?Here is the processed dataset after I run
10_Collect_AMR_Data.py
.I use the default hyperparameter from
model_generate_t5.json
Here is the detail error when I run the
10_Collect_AMR_Data.py
.