aiegoo / tyeng-whisper-book

Natural Language Processing with Transformers building applications with Hugging Face a book by Oreilly
Apache License 2.0
0 stars 1 forks source link

Read #1

Open aiegoo opened 1 year ago

aiegoo commented 1 year ago

Useful links found in the book

  1. TransformersLibrary

  2. Rnn house_generate house_read

  3. voice and speech recognition: article

  4. Dataset korean hub

  5. Aihub for test and train

  6. Zeroth-korean github

excerpts from the article

Speech recognition is the ability for a device to recognize individual words or phrases from human speech. These words can be used to command the operation of a system -- computer menus, industrial controls or direct input of speech into an application -- as is the case with dictation software. Speech recognition systems can be speaker independent, typically with a limited vocabulary, or speaker dependent. The former is used when a limited vocabulary is expected to be used within a known context. The latter allows for greater vocabulary size, but at the cost of "training" the system for each specific user. This training typically consists of a user uttering a specific series of words and phrases so the system can learn the user's pronunciation techniques and speech patterns. It then creates a template specifically for each user.

aiegoo commented 1 year ago

references

  1. Hands-On Machine Learning with Scikit-Learn and TensorFlow, by Aurélien Géron (O’Reilly) Deep Learning for Coders with fastai and PyTorch
  2. Deep Learning for Coders with fastai and PyTorch, by Jeremy Howard and Sylvain Gugger (O’Reilly)
  3. GithubRepo
  4. myKaggle-onofflink
  5. H/W requirements (NVIDIA Tesla P100 GPUs, which have 16GB of memory.)
aiegoo commented 1 year ago

Workflow

  1. From Trainer library to Accelerate library for full control of training loop and for large-scale transformers
aiegoo commented 1 year ago

github code run

Transformers Notebooks

This repository contains the example code from our O'Reilly book Natural Language Processing with Transformers:

book-cover

Getting started

You can run these notebooks on cloud platforms like Google Colab or your local machine. Note that most chapters require a GPU to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA.

Running on a cloud platform

To run these notebooks on a cloud platform, just click on one of the badges in the table below:

Chapter Colab Kaggle Gradient Studio Lab
Introduction Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Text Classification Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Transformer Anatomy Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Multilingual Named Entity Recognition Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Text Generation Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Summarization Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Question Answering Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Making Transformers Efficient in Production Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Dealing with Few to No Labels Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Training Transformers from Scratch Open In Colab Kaggle Gradient Open In SageMaker Studio Lab
Future Directions Open In Colab Kaggle Gradient Open In SageMaker Studio Lab

Nowadays, the GPUs on Colab tend to be K80s (which have limited memory), so we recommend using Kaggle, Gradient, or SageMaker Studio Lab. These platforms tend to provide more performant GPUs like P100s, all for free!

Note: some cloud platforms like Kaggle require you to restart the notebook after installing new packages.

Running on your machine

To run the notebooks on your own machine, first clone the repository and navigate to it:

$ git clone https://github.com/nlp-with-transformers/notebooks.git
$ cd notebooks

Next, run the following command to create a conda virtual environment that contains all the libraries needed to run the notebooks:

$ conda env create -f environment.yml

Note: You'll need a GPU that supports NVIDIA's CUDA Toolkit to build the environment. Currently, this means you cannot build locally on Apple silicon 😢.

Chapter 7 (Question Answering) has a special set of dependencies, so to run that chapter you'll need a separate environment:

$ conda env create -f environment-chapter7.yml

Once you've installed the dependencies, you can activate the conda environment and spin up the notebooks as follows:

$ conda activate book # or conda activate book-chapter7
$ jupyter notebook

FAQ

When trying to clone the notebooks on Kaggle I get a message that I am unable to access the book's Github repository. How can I solve this issue?

This issue is likely due to a missing internet connection. When running your first notebook on Kaggle you need to enable internet access in the settings menu on the right side.

How do you select a GPU on Kaggle?

You can enable GPU usage by selecting GPU as Accelerator in the settings menu on the right side.

Citations

If you'd like to cite this book, you can use the following BibTeX entry:

@book{tunstall2022natural,
  title={Natural Language Processing with Transformers: Building Language Applications with Hugging Face},
  author={Tunstall, Lewis and von Werra, Leandro and Wolf, Thomas},
  isbn={1098103246},
  url={https://books.google.ch/books?id=7hhyzgEACAAJ},
  year={2022},
  publisher={O'Reilly Media, Incorporated}
}