fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #102731

[FSDP] When amp is enabled, there is a noticeable differenc…

### 🐛 Describe the bug When enabling amp training with PyTorch native autocast, I noticed there seems to be obvious difference for DDP based model and FSDP based model. Here is a minimum example …

HAOCHENYE updated 5 months ago
2
ml-explore/mlx-examples #714

[Feature Request] Support for QDoRA: Efficient quantized fin…

> Today we’re releasing the next step: QDoRA. This is just as memory efficient and scalable as FSDP/QLoRA, and critically is also as accurate for continued pre-training as full weight training. We thi…

s-smits updated 2 months ago
2
pytorch/benchmark #1545

Enhance TorchBench coverage for large distributed workloads …

The proposed work tasks are as below: - [ ] Enable CI support for IBM Cloud to enhance the testing infrastructure for FSDP - [ ] Benchmark new model(s) for FSDP training - e.g. add new hf_T5 with 3B…

spzala updated 1 year ago
1
unslothai/unsloth #456

ValueError: Can't find a valid checkpoint at checkpoints

I'm trying to continue train from checkpoint, but get some error, can you help to example code for it? Model: `unsloth/tinyllama-bnb-4bit` ``` from trl import SFTTrainer trainer = SFTTrainer( …

thewebscraping updated 5 months ago
2
apoorvkh/torchrunx #25

README and documentation

(for later)

apoorvkh updated 2 weeks ago
3
axolotl-ai-cloud/axolotl #1522

(OOM) FSDP+QLora 2*RTX3090 (24G per card) finetuning on 70b …

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. …

yaohwang updated 5 months ago
4
OpenMOSS/AnyGPT #31

Train_loss = 0 and Eval_loss = NaN in stage2_sft

Hello! Thank you for your work at MLLM. I had a fine-tuning bug that I couldn't fix: when I ran the `stage2_sft.sh` script and trained with speech_conv_datasets only, the logger showed that the trai…

xuxiaoang updated 2 months ago
3
facebookresearch/dinov2 #266

Failure when not using FSDP mixed precision

When training without providing the `mixed_precision` argument to FSDP, there is an error related to dtype mismatch in `dinov2/layers/block.py`. Is this expected? Full stacktrace: ```txt File "/.…

schmidt-ai updated 1 year ago
1
pytorch/pytorch #113188

Fix docstring errors in _common_utils.py, _optim_utils.py, _…

Please fix the following issues. First, make sure to install the required tools: ``` pip3 install pydocstyle ``` ``` pip3 install ruff ``` Then complete the followings steps: 1. Run `pydocst…

svekars updated 11 months ago
3
artidoro/qlora #198

Training on 2x40GB A100s with FSDP: ValueError

Hello, Currently I am trying to run qlora.py script with the 65B model on 2 A100 40GB GPUs with the script ```accelerate launch qlora.py --args``` with ```--args``` the ones given in the rep…

ffohturk updated 1 year ago
5

上一页 1...24 25 26 27 28 29 30...100 下一页

1000+ results for fsdp

1000+ results
for fsdp