llm-training Search Results

1000+ results
for llm-training

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenBMB/MiniCPM-V #486

[BUG] Data fetch error - typo

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

anthisyme updated 3 weeks ago
7
Tencent/TencentPretrain #51

lora训练llama 貌似不支持？

如题：命令如下： python pretrain.py --pretrained_model_path models/llama-7b.bin --dataset_path datasets/ceshi --spm_model_path /u01/wangcheng/llm/llama/tokenizer.model --config_path models/llama/7b_config.js…

wind91725 updated 1 year ago
3
joshcarp/llm.go #2

BPE tokenizer Implementation

Hi, I'm interested in contributing to implementing the BPE tokenizer. Since we're using gpt-2 encoding (as shown in the preprocessors), I think we can use the original implementation of `tiktoke…

albertpurnama updated 5 months ago
5
AkihikoWatanabe/paper_notes #775

Towards Language Models That Can See: Computer Vision Throug…

# URL - https://arxiv.org/abs/2306.16410 # Affiliations - William Berrios, N/A - Gautam Mittal, N/A - Tristan Thrush, N/A - Douwe Kiela, N/A - Amanpreet Singh, N/A # Abstract - We propose …

AkihikoWatanabe updated 8 months ago
2
huggingface/alignment-handbook #59

Get this error on run_sft.py when calling "trainer.push_to_h…

Here's the call I'm using to run the script: ``` ACCELERATE_LOG_LEVEL=info accelerate launch --config_file examples/hf-alignment-handbook/configs/accelerate_configs/deepspeed_zero3.yaml --num_proces…

ohmeow updated 10 months ago
7
Upaya07/NeurIPS-llm-efficiency-challenge #1

What is the role of Mistral Inference during Natural-Instruc…

First, I would like to congratulate you on winning first place in the NeurIPS2023 llm-efficiency challenge! 🥳🥳 I wrote an issue since I've got a question while reading Repo's README introduced Birb…

gauss5930 updated 11 months ago
1
OpenAdaptAI/OpenAdapt #37

Explore RWKV-LM

How can we use RWKV-LMM to implement "infinite" context lengths? From https://github.com/BlinkDL/RWKV-LM: > RWKV is an RNN with transformer-level LLM performance. It can be directly trained like…

abrichr updated 1 year ago
2
ROCm/TransformerEngine #79

[FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs…

### Problem Description Llama3 8B FP8 OOMs at the same batch size as BF16. I need to decrease the batch size to `2` for it to not OOM. At batch size 2, TE FP8 is **21% slower** than torch compile B…

OrenLeung updated 3 weeks ago
6
pigbreeder/CodeMemo #16

LLM

# 行业角度看LLM 通向AGI之路：大型语言模型（LLM）技术精要 # 大模型有哪些 https://zhuanlan.zhihu.com/p/611403556 # 模型结构为什么现在的LLM都是Decoder only的架构？ lowrank角度 # 如何训练 [Ladder Side-Tuning：预训练模型的“过墙梯”](https://kexue.f…

testpppppp updated 1 year ago
10
pytorch/pytorch #117742

Modifying parameters of FSDP-wrapped module by hand without …

# Modifying parameters of FSDP-wrapped module by hand without summon_full_params context ## Issue description I am training a large language module using FSDP. I want to store EMA weights wh…

blunt-octopus updated 3 months ago
4

上一页 1...81 82 83 84 85 86 87...100 下一页

1000+ results for llm-training

1000+ results
for llm-training