goodbai-nlp / AMRBART

Code for our paper "Graph Pre-training for AMR Parsing and Generation" in ACL2022
MIT License
92 stars 28 forks source link

Tokenizer for AMRBART-large-finetuned-AMR3.0-AMRParsing #13

Closed HenryCai11 closed 1 year ago

HenryCai11 commented 1 year ago

I noticed that for the finetuned AMRBarts, there are no tokenizers offered in the huggingface hub, whereas the v2 models have tokenizers with a different vocab size (v1 53844 vs. v2 53228). My questions are:

  1. Where can I get the tokenizers for those finetuned models?
  2. Is there a demonstration for tokens used in v2 models (because I found that newly-added tokens used in v2 models are different from tokens illustrated in the paper)?
  3. Is it OK for me to use BartTokenizer to load the pretrained AMR tokenizers?

Thank you!

goodbai-nlp commented 1 year ago

Hi,

Thanks for your interest!

Hope these comments can help you.

HenryCai11 commented 1 year ago

Thank you so much!

HenryCai11 commented 1 year ago

@goodbai-nlp Hi, sorry to bother again. I still wonder how should I initialize the toeknizer with AMRBartTokenizer.

from transformers import BartForConditionalGeneration, BartTokenizer, AutoConfig
from spring_amr.tokenization_bart import AMRBartTokenizer

config = AutoConfig.from_pretrained("xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing")
model = BartForConditionalGeneration.from_pretrained("xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing")
tokenizer = AMRBartTokenizer.from_pretrained("facebook/bart-large", config=config)

I tried initializing this way. However, length of the tokenizer did not match the vocab size in config. Did I miss the point for the initialization? Looking forward to your reply. Thank you!

goodbai-nlp commented 1 year ago

Hi,

I assume you are trying to initialize the tokenizer in the v1 version. You may follow the code here. Additionally, there is no need to pass the config parameter when initializing our tokenizer.

HenryCai11 commented 1 year ago

Thank you!