A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Linked the attention docs to the html docs, fixed errors reported by Sphinx (cyclical imports, wrong indentation, wrong section names).
Additionally tried to make the attention docs render nicer in HTML (turns out nbsphinx does not support many things that were used in that notebook, like <br> inside the table)
Type of change
[x] Documentation change (change only to the documentation, either a fix or a new content)
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Infra/Build change
[ ] Code refractor
Changes
Please list the changes introduced in this PR:
Added attention notebook to docs/index.rst
Made changes to some tables in the attention notebook to render better in HTML (although still not great).
Fixed the errors reported by Sphinx
Reverted the Sphinx version on GitHub to be consistent with the one in the internal CI - the new Sphinx version introduces multiple rendering issues. Will try to fix them in a followup.
Description
Linked the attention docs to the html docs, fixed errors reported by Sphinx (cyclical imports, wrong indentation, wrong section names). Additionally tried to make the attention docs render nicer in HTML (turns out nbsphinx does not support many things that were used in that notebook, like
<br>
inside the table)Type of change
Changes
Please list the changes introduced in this PR:
Checklist: