-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
如题:命令如下:
python pretrain.py --pretrained_model_path models/llama-7b.bin --dataset_path datasets/ceshi --spm_model_path /u01/wangcheng/llm/llama/tokenizer.model --config_path models/llama/7b_config.js…
-
Hi,
I'm interested in contributing to implementing the BPE tokenizer.
Since we're using gpt-2 encoding (as shown in the preprocessors), I think we can use the original implementation of `tiktoke…
-
# URL
- https://arxiv.org/abs/2306.16410
# Affiliations
- William Berrios, N/A
- Gautam Mittal, N/A
- Tristan Thrush, N/A
- Douwe Kiela, N/A
- Amanpreet Singh, N/A
# Abstract
- We propose …
-
Here's the call I'm using to run the script:
```
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file examples/hf-alignment-handbook/configs/accelerate_configs/deepspeed_zero3.yaml --num_proces…
-
First, I would like to congratulate you on winning first place in the NeurIPS2023 llm-efficiency challenge! 🥳🥳
I wrote an issue since I've got a question while reading Repo's README introduced Birb…
-
How can we use RWKV-LMM to implement "infinite" context lengths?
From https://github.com/BlinkDL/RWKV-LM:
> RWKV is an RNN with transformer-level LLM performance. It can be directly trained like…
-
### Problem Description
Llama3 8B FP8 OOMs at the same batch size as BF16. I need to decrease the batch size to `2` for it to not OOM. At batch size 2, TE FP8 is **21% slower** than torch compile B…
-
# 行业角度看LLM
通向AGI之路:大型语言模型(LLM)技术精要
# 大模型有哪些
https://zhuanlan.zhihu.com/p/611403556
# 模型结构
为什么现在的LLM都是Decoder only的架构?
lowrank角度
# 如何训练
[Ladder Side-Tuning:预训练模型的“过墙梯”](https://kexue.f…
-
# Modifying parameters of FSDP-wrapped module by hand without summon_full_params context
## Issue description
I am training a large language module using FSDP.
I want to store EMA weights wh…