llm-training Search Results

1000+ results
for llm-training

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AkihikoWatanabe/paper_notes #1225

MM-LLMs: Recent Advances in MultiModal Large Language Models…

# URL - https://arxiv.org/abs/2401.13601 # Affiliations - Duzhen Zhang, N/A - Yahan Yu, N/A - Chenxing Li, N/A - Jiahua Dong, N/A - Dan Su, N/A - Chenhui Chu, N/A - Dong Yu, N/A # Abstra…

AkihikoWatanabe updated 1 month ago
1
NVIDIA/TensorRT-LLM #902

Support for Falcon 7B: HF to TRT weight Conversion fails

### System Info CPU:X_86_64 GPU: A10 OS: Ubuntu 22.04 ### Who can help? @Tracin @byshiue please help. ### Information - [X] The official example scripts - [ ] My own modified script…

amir1m updated 2 days ago
6
huggingface/optimum-neuron #735

AttributeError: can't set attribute 'deepspeed_plugin'

### System Info ```shell accelerate 1.1.1 neuronx-cc 2.14.227.0+2d4f85be neuronx-distributed 0.8.0 neuronx-distributed-training 1.0.0 optimum …

anushka0415 updated 3 days ago
3
AMDResearch/hpcfund #30

Get help for distributed model training on MI250

Hi, I would like to test a program for distributed LLM model training on mi2508x and I want to do model parallel to distribute parameters across GPUs. Is there any framework that I should use to ac…

OswaldHe updated 2 weeks ago
6
TsinghuaC3I/CoGenesis #1

when loading the fine-tuned smaller model , an error happens…

您好！我是哈尔滨工业大学的一名学生，最近正尝试复刻您关于CoGenesis的工作。我遇到了一些棘手的麻烦，希望得到您的帮助。问题如下：在“基于草稿的方法”下，加载微调过的小模型时出现了如下的报错：**ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape t…

FireCaramelPudding updated 3 weeks ago
2
linkedin/Liger-Kernel #319

Training LLaVA with the Liger kernel results in degraded per…

### 🐛 Describe the bug I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel. The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model traine…

y-rok updated 1 week ago
2
GopherML/bag #23

Add more practical examples

Awesome project, I thank you for expanding the Go ecosystem in this area and as per our Reddit discussion I'm putting this down here. It would be amazing to have real-world/practical training sets/…

iberflow updated 1 month ago
2
stanfordnlp/dspy #1717

Teleprompter removing classes

I've been trying to use DSPy in different contexts where I see fit, but I've been unsuccessful in obtaining any good results. I have a very long prompt for a classification task that needs to describe…

paulacanva updated 3 weeks ago
9
mlcommons/training_results_v4.0 #6

cuDNN Error: Tensor 'sdpa_fp8::Amax_O' strides not set

have a try to reproduce Nvidia's results on using slurm + enroot + pyxis 1. downgrade the transformers and huggingface_hub libs (huggingface_hub==0.23.2 transformers==4.40.2) because the versions…

zhenghuanbo updated 2 months ago
1
wandb/wandb #8781

[Feature]: Allow cache to be bypassed for artifact uploads

### Description Hi, I was recently guilty of filling up the disk space on our cluster because my wandb artifact cache had grown to 1.6TB over the last month alone while I was experimenting with my LL…

collinmccarthy updated 2 weeks ago
2

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for llm-training

1000+ results
for llm-training