-
In the last few days I've been playing around trying to see how fast I can get a 19M model training on a single 4090. My somewhat arbitrary goal is 1 hour, down from about 24 hours (just on `humanoid-…
-
The training script relies on FSDP's `MixedPrecisionPolicy` to take care of dtypes.
But when data-parallelism is not used (for example when running in a single node with TP 8) then this does not ha…
-
Hi Matt, Dan,
thanks for this wonderful library.
While training some augmented models, I noticed that there are some steps in the process which could benefit a lot from parallelization.
There a…
-
### Feature request / 功能建议
The current Dataloader implementation in this repository is underperforming due to a lack of efficient parallelization. This often results in the CPU handling data preproc…
-
**Describe the bug**
This may not be a bug, rather asking for help debugging or more clear warning messages.
I'm running the training process via `ns-train nerfacto` command, and I keep seeing the…
-
I had a chance to reflect after PTC / CUDA-MODE and wanted to share some thoughts on future plans for sparsity in torchao.
## **Current State**
There are two components of sparsity, accuracy and…
-
Inspired by a recent back and forth with @gau-nernst we should add some quantized training recipes in AO for small models (600M param range)
Character.ai recently shared that they're working on qua…
-
I am using AutoModelForSequenceClassification for classifying a large model. Can I use this library, and how should I use it?
Additionally, if my output is only one token and I do batch inference, w…
-
Load EVAL samples from data/KITTI/object/training
Done: total EVAL samples 24
Detecting objects: 0%| …
-
Does DeepSpeed support Pytorch code with [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)? If not, do think it may be helpful to DeepSpeed users for further speedups?
…