Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
This PR has updates to a few of the notebook tutorials based on issues that I ran into when running the notebooks on Intel Tiber Developer Cloud.
Quickstart notebook:
Adds an exit() to the end of the notebook. Other notebooks have the exit, but I noticed this one did not.
New Profiler and Optimization notebook:
Fixed typo and formatting
RAG Application notebook:
The name of the docker image was incorrect, so trying to copy/paste the command was causing an error.
Single card tutorials:
Spelling fix
TGI Gaudi notebook:
Typos and added a link to the Llama 3 model being used, so that it's easier to find the model and request access.
Formatted the environment variables with default values and descriptions into a table to make it easier to read.
Added docker stop <container name> and docker rm <container name> commands for the tgi-gaudi and tgi-gaudi-perf containers. If the tgi-gaudi container is not stopped, the tgi-gaudi-perf container will get an error about not being able to acquire the device. The perf container should also be stopped when it's done being used, so that the user can move on to run scripts using the Gaudi devices.
Added a note about waiting for the TGI server to be ready in the performance example section (if you run the next cell too quickly, the curl command will fail)
Training a Classifier CIFAR10 notebook:
Link formatting
The notebook says that was adapted from a pytorch cifar10 tutorial. That PyTorch tutorial happens to use a batch size of 4. This Gaudi tutorial instead uses a batch size of 64, but there were a couple 4s that are hardcoded in places which was making the number of labels being displayed not match up with the number of images being displayed. I fixed this by changing those 4s to be batch_size.
This PR has updates to a few of the notebook tutorials based on issues that I ran into when running the notebooks on Intel Tiber Developer Cloud.
Quickstart notebook:
exit()
to the end of the notebook. Other notebooks have the exit, but I noticed this one did not.New Profiler and Optimization notebook:
RAG Application notebook:
Single card tutorials:
TGI Gaudi notebook:
docker stop <container name>
anddocker rm <container name>
commands for the tgi-gaudi and tgi-gaudi-perf containers. If the tgi-gaudi container is not stopped, the tgi-gaudi-perf container will get an error about not being able to acquire the device. The perf container should also be stopped when it's done being used, so that the user can move on to run scripts using the Gaudi devices.Training a Classifier CIFAR10 notebook:
4
s that are hardcoded in places which was making the number of labels being displayed not match up with the number of images being displayed. I fixed this by changing those4
s to bebatch_size
.