fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/nccl #1473

Why tree algorithms are specifically targeted at All-Reduce?

I'm running nccl-test `all-reduce` between two nodes, and I've found that the tree algorithm performs much better than the ring algorithm. However, through reading the NCCL source code, I noticed tha…

jxh314 updated 2 days ago
1
huggingface/trl #2022

Negative Entropy in TRL PPOv2Trainer TLDR Example

### System Info - `transformers` version: 4.44.0 - Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31 - Python version: 3.11.9 - Huggingface_hub version: 0.23.4 - Safetensors version: 0.4.…

RylanSchaeffer updated 1 month ago
3
meta-llama/llama-recipes #699

Inference with "FULL_STATE_DICT" checkpoint from FSDP fine t…

### 🚀 The feature, motivation and pitch I can get a single checkpoint after using FSDP fine tune the model. ![image](https://github.com/user-attachments/assets/8c3019e3-458c-49e6-9cdd-3f868692df46) …

mathmax12 updated 4 days ago
2
huggingface/trl #1980

`PPOTrainer` OOM Error Because of Forced Upcast to `torch.fl…

### System Info - `transformers` version: 4.44.0 - Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31 - Python version: 3.11.9 - Huggingface_hub version: 0.23.4 - Safetensors version: 0.4.…

RylanSchaeffer updated 1 month ago
1
LambdaLabsML/distributed-training-guide #37

Add tensor parallelism to 405b chapter or advanced topics?

See https://pytorch.org/docs/stable/distributed.tensor.parallel.html llama 405b paper discusses using FSDP, pipeline parallelism, context parallelism, and tensor parallelism It'd be relatively s…

corey-lambda updated 2 days ago
1
UKPLab/sentence-transformers #2931

Error in Fully Sharded Data Parallelism (FSDP) set up

Trying to finetune a model whose max seq length is 8k, _BAAI/bge-m3_. I'm trying to finetune on some retrieval task. Here's my trainer set up ```python model = SentenceTransformer(model_id, de…

MohammedAlhajji updated 4 weeks ago
4
tianyi-lab/Cherry_LLM #24

The training bash script for FastChat is what?

Thank you very much for the work you have brought, which is very helpful for those of us with fewer training resources. I am a newcomer to the field of NLP and am not very familiar with training frame…

daidaiershidi updated 3 days ago
2
axolotl-ai-cloud/axolotl #1838

ORPO results in `Cannot flatten integer dtype tensors`

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

maziyarpanahi updated 1 month ago
3
OpenFabrics/fsdp_docs #135

Need to update fsdp_setup scripts

When adding the NVMe drives, we changed what cards are installed in nodes 01 and 02, and also removed the bifurcating PCI-e card from the Mellanox cards in nodes 09 and 10. We need to update the mach…

dledford updated 1 week ago
4
NVIDIA/TransformerEngine #1135

fp8_model_init doesn't work with DDP

When I'm trying to use `fp8_model_init` feature, it doesn't seem compatible with DDP. It throws an error: `RuntimeError: Modules with uninitialized parameters can't be used with "DistributedDataParal…

MaciejBalaNV updated 1 month ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for fsdp

1000+ results
for fsdp