bjascob / amrlib

A python library that makes AMR parsing, generation and visualization simple.
MIT License
216 stars 33 forks source link

Add remove_unused_columns to hf_args #65

Closed w013nad closed 5 months ago

w013nad commented 5 months ago

With recent versions of transformers, all unused transformer trainer arguments are deleted. This causes issues in the training process where "target_ids" and "decoder_attention_mask" variables were being deleted even though they were being used in this code. Works with transformers 4.36.2

bjascob commented 5 months ago

Can you explain what "issues" this causes. In general, amrlib doesn't directly use any of these but it does attempt to pass the proper tokenized encodings to the HF trainer for its use. It may be that some of these are no longer needed and should be removed instead.

w013nad commented 5 months ago

The primary issue arises from the renaming of certain variables in the Hugging Face trainer, which Amrlib relies on for its functionality. For instance, ‘target_ids’ has been renamed to ‘labels’. However, I couldn’t find the new name for ‘decoder_attention_mask’.

These variables are still essential for the training process and are utilized within this codebase. The trainer, by design, discards any non-standard variables, leading to potential errors. This is because Amrlib expects these variables to be present, but due to this setting, they are no longer available under the "batch" variable used in Amrlib's training code.

In essence, the issue isn’t redundancy, but rather a mismatch in variable names between Amrlib and the updated Hugging Face trainer. It might be beneficial to update these variable names in Amrlib to align with the changes in the Hugging Face trainer. Or we can take the simple route and just tell the trainer not to delete them. From what I can tell, the code still works either way. With this change, I was able to obtain a BLEU of 53.66 with T5wtense (LDC2020).

bjascob commented 5 months ago

The code in these 2 modules is fairly dated so I modernized it considerably and combined the two modules into one called generate_xfm. Like the parsing module, this will now work with any sequence-to-sequence model on Huggingface (ie.. bart).

Other than renaming the top level module, the classes and methods remain practically the same. The older models (including anything you've trained) will still work with the new code, though, depending on how you're using them, you may need to import from the new module name.