goodbai-nlp / AMRBART

Code for our paper "Graph Pre-training for AMR Parsing and Generation" in ACL2022
MIT License
92 stars 28 forks source link

Inference Needs train, test and val datasets to run #12

Closed SreehariSankar closed 1 year ago

SreehariSankar commented 1 year ago

To run inference (say AMR-->Text) train, test and validation sets are required. Please provide a way to run the model on a pre-processed text file without needing all this data.

goodbai-nlp commented 1 year ago

Hi, we will update the code later. As an interim solution, you can take the files in the example directory as train/validation data.

SreehariSankar commented 1 year ago

Thanks! Just one last question: Could you please let me know how to tokenize a pre-processed AMR graph so that I can directly input it to the BART model? You can assume I have all the things (BART model ready, Tokenizer ready, etc). Should i just take the amr graph as a string and pass it to the tokenizer and then pass it to BART? Thanks! Much appreciated.

goodbai-nlp commented 1 year ago

Hi, we implement an AMRBartTokenizer to tokenize pre-processed AMRs. To use the tokenizer, you can do the following steps:

  1. create a tokenizer instance:
    tokenizer = AMRBartTokenizer.from_pretrained(
        "facebook/bart-large",
    )
  2. tokenize the pre-processed AMR strings and transform into ids:
    amr_ids = [tokenizer.amr_bos_token_id] + tokenizer.tokenize_amr(amr_string.split())[:max_src_length - 2] + [tokenizer.amr_eos_token_id]

    If you use AMRBART, please follow here to add special tokens.

SreehariSankar commented 1 year ago

Thanks! All the best with your ACL 2022!

goodbai-nlp commented 1 year ago

This bug has been fixed.