llm-training Search Results

1000+ results
for llm-training

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

heeju-kim2/workspace_w-intern #1

BF16 PEFT training references

### Reference code - Llama-recipes code [https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c67345d897c0eb6529eba076e8b8](https://github.com/meta-llama/llama-recipes/tree/b7fd81c71239c…

heeju-kim2 updated 5 months ago
2
irthomasthomas/undecidability #830

kieval: A Knowledge-grounded Interactive Evaluation Framewor…

- [ ] [WisdomShell/kieval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models](https://github.com/WisdomShell/kieval) # WisdomShell/kieval: A Knowledge-grounded Interacti…

ShellLM updated 6 months ago
1
aws/deep-learning-containers #3291

[feature-request] Support for JAX container

*Concise Description:* I'd like to use JAX for distributed training of LLMs. In addition, the new release of Keras supports JAX as a backend in addition to TF. *Describe the solution you'd like* …

sbhavani updated 1 year ago
1
Outsider565/LoRA-GA #1

performance on vision models like vit or stable diffusion

thanks for your awesome work! I was wondering if you got any results on vision models like vit or stable diffusion?

zhch-sun updated 4 months ago
5
mukel/llama2.java #8

Llamafile comparison?

@mukel thank you for creating this project! I would like to discuss the following topics: 1. Please enable the Discussions tab for posts like this, which are not real "issues" 2. Do you plan on rele…

hrstoyanov updated 6 months ago
3
FlagOpen/FlagEmbedding #745

ValueError: Attempting to unscale FP16 gradients.

Here is the Google Colab link I used for fine-tuning : [https://colab.research.google.com/drive/1kiALBR1UarPobiftZmiHfwFyk7hTCDnV?usp=sharing](url) When I fine-tune the LLM-embed for tool retriev…

QuangTQV updated 7 months ago
4
AGI-Edgerunners/LLM-Adapters #65

about loss

Hi, I am trying to finetune llama on commonsense_170k. However, I find the when the loss value is around 0.6, it almost does not decrease. Is it normal? ` WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=1,2,3,4 …

haoyuwangwhy updated 6 months ago
1
davmacario/MDI-LLM #28

Add support for model-parallel training

The main limitation of LLMs is the huge model size, plus, during training, the required VRAM/RAM necessary to store the model + the backpropagation parameters are much higher than during inference. A…

davmacario updated 7 months ago
1
Tencent/TencentPretrain #51

lora训练llama 貌似不支持？

如题：命令如下： python pretrain.py --pretrained_model_path models/llama-7b.bin --dataset_path datasets/ceshi --spm_model_path /u01/wangcheng/llm/llama/tokenizer.model --config_path models/llama/7b_config.js…

wind91725 updated 1 year ago
3
konabuta/my-scratch-book #11

Paper: DIN-SQL: Decomposed In-Context Learning of Text-to-SQ…

## DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction ### Summary by Copilot - **DIN-SQL** stands for **Decomposed In-Context Learning of Text-to-SQL with Self-Correctio…

konabuta updated 9 months ago
6

上一页 1...80 81 82 83 84 85 86...100 下一页

1000+ results for llm-training

1000+ results
for llm-training