fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Adamliu1/SNLP_GCW #124

[CODE OPTIMISATION] Unlearning a 8B model without offloading…

The issue with offloading is the very inefficient use of GPU (more than 90% of the unlearning time is spent on offloading and loading memory). Instead, we could try to use parallelize the unlearning …

TheRootOf3 updated 4 weeks ago
2
Fannovel16/comfyui_controlnet_aux #250

Error in importing dwpose.py

Hi, I'm encountering problem when trying to import dwpose.py. It looks like the problem is cuda version or torch version, I'm wondering that is there specific requirements for cuda or pytorch version?…

amnesicloud updated 7 months ago
1
NVIDIA/TransformerEngine #1116

How to debug CUDNN_STATUS_EXECUTION_FAILED?

I'm running my code with: ``` env CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stderr torchrun --standalone --nproc_per_node=8 -m extra_scripts.model_playground_train ``` and getting errors like: ``` …

vedantroy updated 2 weeks ago
7
evo-design/evo #11

Finetune script

Could you provide a script/notebook demo to show how to finetune this model?

JinyuanSun updated 1 day ago
12
axolotl-ai-cloud/axolotl #953

Mixtral LoRA error

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. …

generalsvr updated 7 months ago
6
pytorch/PiPPy #1026

(FSDP or DDP) + PP support

- [x] FSDP + PP - [ ] DDP + PP - [ ] DCP path

wconstab updated 2 weeks ago
6
axolotl-ai-cloud/axolotl #1750

RuntimeError: Cannot re-initialize CUDA in forked subprocess…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

RishabhMaheshwary updated 2 months ago
7
vllm-project/llm-compressor #189

Qwen1.5-MoE-A2.7B-Chat w4a16 Quantization Failed

**Describe the bug** I tried to quantize Qwen1.5-MoE-A2.7B-Chat with w4a16 for vllm PR: https://github.com/vllm-project/vllm/pull/7766 raise error TypeError: forward() got multiple values for argume…

donpromax updated 2 weeks ago
2
meta-llama/llama-recipes #641

discrepancy of the FSDP training step with the **alpaca_data…

### 🚀 The feature, motivation and pitch I’m new to working with FSDP and testing the FSDP fine-tuning with **alpaca_dataset** dataset. I have a single node with 8 GPUs and have set the batch size t…

mathmax12 updated 1 month ago
3
deep-diver/llamaduo-spinoff #1

Roadmap

We have discussed the following so far. - Decide which domain - Math([GSM8k](https://huggingface.co/datasets/gsm8k)), Code([Stack 2](https://huggingface.co/datasets/bigcode/the-stack-v2)), Gene…

deep-diver updated 5 months ago
6

上一页 1...93 94 95 96 97 98 99...100 下一页

1000+ results for fsdp

1000+ results
for fsdp