Denis2054 / Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more
https://denis2054.github.io/Transformers-for-NLP-2nd-Edition/
MIT License
807 stars 306 forks source link

About Tokenizer.ipynb of Chapter 9 in the second edition #12

Open ElongHu opened 1 month ago

ElongHu commented 1 month ago

In the Tokenizer.ipynb of Chapter 9 in the second edition, all the similarity calculations are inconsistent with those in the book, and in some cases, the conclusions in the similarity representation are completely opposite.

Denis2054 commented 1 month ago

Thank you for this feedback. Here is the explanation and the solution:

  1. Explanation a) the libraries, modules, and packages continually change, which might impact the code b) AI NLP algorithms are stochastic, meaning that there is randomness in the outputs

  2. Solution: Run the more recent version which is Tokenizers.ipynb in the Transformers for NLP and CV, 3rd Edition repository: https://colab.research.google.com/github/Denis2054/Transformers-for-NLP-and-Computer-Vision-3rd-Edition/blob/main/Chapter10/Tokenizers.ipynb

It is open-source, thus free, and also it is self-contained.