Is the token dictionary for the data2vec 2.0 text model available anywhere? The 'data' field in the 'task' dictionary points to '/fsx-wav2vec/abaevski/data/nlp/bookwiki_aml-full-mmap2-bin', and I'm unable to load the checkpoint without the dict.text file. Is the dictionary identical to the RoBERTa 50k BPE or is yours different due to only being trained on BooksCorpus and English Wikipedia? Any help here is appreciated, thanks! @alexeib
❓ Questions and Help
Before asking:
What is your question?
Is the token dictionary for the data2vec 2.0 text model available anywhere? The 'data' field in the 'task' dictionary points to '/fsx-wav2vec/abaevski/data/nlp/bookwiki_aml-full-mmap2-bin', and I'm unable to load the checkpoint without the dict.text file. Is the dictionary identical to the RoBERTa 50k BPE or is yours different due to only being trained on BooksCorpus and English Wikipedia? Any help here is appreciated, thanks! @alexeib
Code
model, args, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([path_to_d2v2_text_cp], strict=False)
What have you tried?
Read the README file and tried looking through the checkpoint to find any dictionary.
What's your environment?
pip
, source):pip install ./
after cloning fairseq repo