-
Proposal:
- At the end of every data segment, save the training checkpoint file/directory name to a file, informing the scheduler (e.g., slurm or lsf) where to find the checkpoint.
- Save the model …
-
## 🚀 Feature Request
Convert opt checkpoint to megatron-lm or fastertransformer
### Motivation
I am currently trying to use opt in a production environment.
However, because the 175B model is …
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
LLaMA-Factory 0.8.3
### Reproduction
I used the example here https://github.com/hiyouga/LLaMA-Factory…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Current Behavior
我在尝试运行
CUDA_VISIBLE_DEVICES=0 python finetune_lora_sft.py出现ValueError: 130004 is not i…
-
```shell
[rank2]: Traceback (most recent call last):
[rank2]: File "/mnt/Lumina-mGPT/lumina_mgpt/finetune_solver.py", line 113, in
[rank2]: solver = Solver(args)
[rank2]: File "/mnt/Lumin…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
###…
-
Hello,
I am trying to use paxml to maxtext ckpt conversion [script](https://github.com/google/maxtext/blob/main/MaxText/convert_gpt3_ckpt_from_paxml.py) but dont seem to have permissions to downlo…
-
### Bug description
Hello everyone,
I am training a model using FSDP with Fabric.
When saving the model to an S3 bucket calling the following function:
```
fabric.save(
"s3://m…
-
Currently, we don't apply QLoRA to either the output projection or token embeddings. There's no great reason to not apply quantization to output projections, we simply don't do this due to limitations…
-
- [ ] Statically enforced frozen dataclasses: https://github.com/pytorch/pytorch/pull/120238#discussion_r1507016873
- [ ] Can we get rid of `NormReduction` and directly pass the partial placement to …
awgu updated
4 months ago