fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #123528

Support integer parameters in FullyShardedDataParallel

### 🚀 The feature, motivation and pitch Libraries like Transformers, vllm and diffusers use large quantized LLMs for inference and fine-tuning. When running large models on several low-memory GPU…

justheuristic updated 5 months ago
7
Dao-AILab/flash-attention #341

flash-attention v2 with activation checkpointing (no_reentra…

With both `flash_attn_varlen_qkvpacked_func` and `CheckpointImpl.NO_REENTRANT` raise Runtime Error below: ```python Traceback (most recent call last): > File "/opt/tiger/antelope/train.py", line …

wjfwzzc updated 2 months ago
8
Lightning-AI/pytorch-lightning #17512

Parameters and Gradient is not logged by WandB under FSDP st…

### Bug description I find that when using FSDP strategy, the model parameters and gradients are not logged by WandB. However, everything works well if I switch FSDP to native DDP strategy. Since …

weicao1990 updated 1 year ago
4
MayDomine/Burst-Attention #3

Can burst-attention be used in Model Inference?

Thanks for the great work and promising performance in model training. are you considering apply and simplify burst-attention on model inference? what gaps are there compared to ring attention with FS…

gitcloneman updated 5 months ago
2
Lightning-AI/litgpt #1473

Batched inference on a single node with multiple GPUs

How to infer a batch of encoded tensors (shape = (B, T)) on 4 GPUs, getting 3~4x tokens/s through put compared to on single GPU? (it's for a small model which can be fit into a GPU's mem) I've tri…

antareson updated 4 months ago
9
pytorch/pytorch #98882

[PT2] AOTAutograd de-dups but skips de-dup guards for DDP

Run DDP with a shared buffer (different TorchDynamo `Source`): Repro Script ``` """ torchrun --standalone --nproc_per_node=1 test/dup_repro.py TORCH_LOGS=aot,dynamo torchrun --standalone --…

awgu updated 6 months ago
13
Lightning-AI/litgpt #1262

Is it possible to run Llama 2 70B with 80Gb?

I'm trying to finetune Llama2 70B using a NVIDIA A100 with 80Gb, but even with batch-size = 1 I'm getting OOM error. I'm using LoRA with quantization this way: `plugins = BitsandbytesPrecision('nf4…

vabatista updated 5 months ago
3
PKU-DAIR/Hetu-Galvatron #3

Error when train galvatron with global mode CUDA error: unc…

I set the environment variables as follow in train_dist.sh in gpt_hf folder: ``` export NUM_NODES=1 export NUM_GPUS_PER_NODE=8 export MASTER_ADDR=localhost export MASTER_PORT=2222 export NODE_RA…

CannonWWW updated 3 months ago
2
fe1ixxu/ALMA #42

OOM 问题, 显卡是A00 40G

用llama factory进行sft可以使用deepspeed zero2 微调llama3-8B的模型，但这个框架就算batch设为1，用deepspeed zero2也会报OOM。用zero3训练会变得很慢，出现这个问题： 2 pytorch allocator cache flushes since last step. this happens when there is hi…

gongye19 updated 5 months ago
4
pytorch/pytorch #73869

[FSDP][BE] TestAutoWrap should inherit from MultiProcessTest…

### 🚀 The feature, motivation and pitch Since it currently does not, skip decorators such as `skip_if_lt_x_gpu(2)` are not properly handled which was discovered as a part of https://github.com/pytorc…

rohan-varma updated 1 year ago
1

上一页 1...40 41 42 43 44 45 46...100 下一页

1000+ results for fsdp

1000+ results
for fsdp