speedups-training Search Results

nicklashansen/tdmpc2 #18

Possible training speedups

In the last few days I've been playing around trying to see how fast I can get a 19M model training on a single 4090. My somewhat arbitrary goal is 1 hour, down from about 24 hours (just on `humanoid-…

josephrocca updated 7 months ago

pytorch/torchtitan #630

Non-DP runs default to float32 precision

The training script relies on FSDP's `MixedPrecisionPolicy` to take care of dtypes. But when data-parallelism is not used (for example when running in a single node with TP 8) then this does not ha…

carmocca updated 2 weeks ago

dragnet-org/dragnet #79

Use more parallelization in training, other speedups

Hi Matt, Dan, thanks for this wonderful library. While training some augmented models, I noticed that there are some steps in the process which could benefit a lot from parallelization. There a…

trifle updated 5 years ago

a-r-r-o-w/cogvideox-factory #77

Fast Dataloader

### Feature request / 功能建议 The current Dataloader implementation in this repository is underperforming due to a lack of efficient parallelization. This often results in the CPU handling data preproc…

alfredplpl updated 26 minutes ago

nerfstudio-project/nerfstudio #3133

Warning to `Install tcnn for speedups` even when tcnn is ins…

**Describe the bug** This may not be a bug, rather asking for help debugging or more clear warning messages. I'm running the training process via `ns-train nerfacto` command, and I keep seeing the…

Ben-Epstein updated 4 months ago

pytorch/ao #1136

[RFC] Sparsity Future Plans

I had a chance to reflect after PTC / CUDA-MODE and wanted to share some thoughts on future plans for sparsity in torchao. ## **Current State** There are two components of sparsity, accuracy and…

jcaip updated 4 days ago

pytorch/ao #554

Quantized Training

Inspired by a recent back and forth with @gau-nernst we should add some quantized training recipes in AO for small models (600M param range) Character.ai recently shared that they're working on qua…

msaroufim updated 2 months ago

unslothai/unsloth #768

AutoModelForSequenceClassification or output is only one to…

I am using AutoModelForSequenceClassification for classifying a large model. Can I use this library, and how should I use it? Additionally, if my output is only one token and I do batch inference, w…

shyoulala updated 3 months ago

ghimiredhikura/Complex-YOLOv3 #42

Guys I get this error when I try to train on my 64 ouster li…

Load EVAL samples from data/KITTI/object/training Done: total EVAL samples 24 Detecting objects: 0%| …

MohamedAboushnief updated 3 years ago

microsoft/DeepSpeed #1746

[REQUEST] Support for CUDA Graphs

Does DeepSpeed support Pytorch code with [CUDA Graphs](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)? If not, do think it may be helpful to DeepSpeed users for further speedups? …

sarvghotra updated 1 year ago

479 results for speedups-training

479 results
for speedups-training