fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

carbonscott/maxie #12

Support automatic resumption in HPC environments

Proposal: - At the end of every data segment, save the training checkpoint file/directory name to a file, informing the scheduler (e.g., slurm or lsf) where to find the checkpoint. - Save the model …

carbonscott updated 4 months ago
4
facebookresearch/metaseq #135

Convert opt to megatron-lm

## 🚀 Feature Request Convert opt checkpoint to megatron-lm or fastertransformer ### Motivation I am currently trying to use opt in a production environment. However, because the 175B model is …

appleeji updated 2 years ago
3
hiyouga/LLaMA-Factory #5364

Is there a way to do QLORA 8bit for Llama3 70B using 2*A6000…

### Reminder - [X] I have read the README and searched the existing issues. ### System Info LLaMA-Factory 0.8.3 ### Reproduction I used the example here https://github.com/hiyouga/LLaMA-Factory…

etemiz updated 3 weeks ago
2
Pillars-Creation/ChatGLM-RLHF-LoRA-RM-PPO #7

[BUG/Help] <title>ModuleNotFoundError: No module named 'data…

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 我在尝试运行 CUDA_VISIBLE_DEVICES=0 python finetune_lora_sft.py出现ValueError: 130004 is not i…

xihaofei updated 9 months ago
1
Alpha-VLLM/Lumina-mGPT #14

Training Scrip Error: AttributeError: 'ChameleonRMSNorm' obj…

```shell [rank2]: Traceback (most recent call last): [rank2]: File "/mnt/Lumina-mGPT/lumina_mgpt/finetune_solver.py", line 113, in [rank2]: solver = Solver(args) [rank2]: File "/mnt/Lumin…

SunzeY updated 1 month ago
5
axolotl-ai-cloud/axolotl #1905

Running Example on Free T4 GPU through Google Colab

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ###…

hammad93 updated 1 day ago
8
AI-Hypercomputer/maxtext #847

mlperf gpt3 ckpt permission issues

Hello, I am trying to use paxml to maxtext ckpt conversion [script](https://github.com/google/maxtext/blob/main/MaxText/convert_gpt3_ckpt_from_paxml.py) but dont seem to have permissions to downlo…

gramesh-amd updated 1 month ago
11
Lightning-AI/pytorch-lightning #19113

Support saving and loading from remote paths in Fabric

### Bug description Hello everyone, I am training a model using FSDP with Fabric. When saving the model to an S3 bucket calling the following function: ``` fabric.save( "s3://m…

claudio-alanaai updated 10 months ago
1
pytorch/torchtune #1000

Apply QLoRA to output projections and token embedding

Currently, we don't apply QLoRA to either the output projection or token embeddings. There's no great reason to not apply quantization to output projections, we simply don't do this due to limitations…

rohan-varma updated 1 month ago
4
pytorch/pytorch #121020

[DTensor] `clip_grad_norm_` follow-ups

- [ ] Statically enforced frozen dataclasses: https://github.com/pytorch/pytorch/pull/120238#discussion_r1507016873 - [ ] Can we get rid of `NormReduction` and directly pass the partial placement to …

awgu updated 4 months ago
7

上一页 1...86 87 88 89 90 91 92...100 下一页

1000+ results for fsdp

1000+ results
for fsdp