Denis2054 / Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more
https://denis2054.github.io/Transformers-for-NLP-2nd-Edition/
MIT License
787 stars 293 forks source link

Special Tokens not Provided #5

Closed mediadepp closed 1 year ago

mediadepp commented 1 year ago

Hi, I guess the following code in the provided notebook in Github has a problem.

tokenizer.train(files=paths, vocab_size=52_000, min_frequency=2, special_tokens=[
    "<s>",
    "<pad>",
    "</s>",
    "<unk>",
    "<mask>",
]

The tokens are not provided. You can find it under chapter four, section three.

lxt3 commented 1 year ago

I believe this was solved in the April 2023 update of the notebook, by pip installing --upgrade accelerate, as well as pip installing the latest transformers module.

Denis2054 commented 1 year ago

Yes. Thank you. Hugging Face now requires an accelerator, the notebook was updated in April 2023. Thank you for explaining this.