issues
search
THUDM
/
SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
https://THUDM.github.io/SwissArmyTransformer
Apache License 2.0
871
stars
84
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output'
#179
deep-practice
opened
15 hours ago
0
请问断点续训应该如何设置
#178
elesun2018
opened
3 months ago
6
transfer_param.py 转换vincuna hf模型成sat模型报错
#177
Lunatic-Solar
opened
3 months ago
17
How to install a model to the right path?
#176
link89
closed
3 months ago
1
NO cogagent?
#175
Mac0q
opened
4 months ago
2
ModuleNotFoundError: No module named 'localAttention'
#174
BlueSkyyyyyy
opened
4 months ago
0
“No backend type associated with device type cpu” when run cli_demo_sat.py
#173
yileld
opened
4 months ago
5
如果想绕过deepspeed做finetune,可以在train的时候直接model.step()来实现吗?
#172
cocoshe
opened
5 months ago
1
Using CogVLM - KeyError (MODEL_URLS) - Google Colab
#171
Baggiorobertozoba
closed
5 months ago
1
MixtralMlpMixin()这个函数里面moe只是计算专家的logits但是没看到分发逻辑
#170
AlenjandroWang
opened
5 months ago
1
AutoModel.from_pretrained()里面不能加载hf版本的权重吗
#169
AlenjandroWang
opened
5 months ago
1
AutoModel.from_pretrained()里面不能加载hf的权重吗
#168
AlenjandroWang
closed
5 months ago
0
怎么从断点恢复微调训练
#167
zoumaguanxin
opened
5 months ago
1
MoE support
#166
1049451037
closed
5 months ago
0
fix rotary bug when q seqlen > cos seqlen
#165
leizhao1234
closed
5 months ago
0
support chatglm rotary in triton
#164
leizhao1234
closed
6 months ago
0
请问针对样本数量不均衡的数据集怎么做样本均衡呢
#163
lln556
opened
6 months ago
1
Questions about your LoRA codes
#162
miznchimaki
closed
6 months ago
7
deepspeed 分布式训练 loss nan or inf
#161
JohnTang93
opened
6 months ago
1
Is sat suuport saving checkpoint by using fp16 or bf16?
#160
xxxwuwq
opened
6 months ago
5
add accumulate ema and fix fp32 weight bug
#159
leizhao1234
closed
6 months ago
0
单机多卡训练时内存占用过高
#158
zodiacg
closed
6 months ago
2
SwissArmyTransformer可以读bin权重文件吗?visualglm-6b项目里就没见pt文件,只有bin。难以微调
#157
qq577288254
closed
5 months ago
5
fix zero3 check
#156
Sleepychord
closed
6 months ago
0
fix model parallel inconsistent init
#155
Sleepychord
closed
6 months ago
0
update ema
#154
leizhao1234
closed
7 months ago
0
support MoE & Mixtral-8x7b
#153
1049451037
closed
5 months ago
0
fix profiling
#152
leizhao1234
closed
7 months ago
0
merge main to glu
#151
1049451037
closed
7 months ago
0
add profiling
#150
leizhao1234
closed
7 months ago
0
deepspeed分布式训练出现sat ValueError inconsistent
#149
elesun2018
opened
7 months ago
1
How to embed video encoder module from pytorch?
#148
zyhzyh88
opened
7 months ago
3
mqa cross & stream chat
#147
1049451037
closed
8 months ago
0
Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture?
#146
tiendung
closed
8 months ago
3
请问如何使用hf加载icetk_glm_130B的tokenizer和GLM130B的模型?
#145
Ajay-Wong
closed
8 months ago
6
FileLock - out of date?
#144
taziksh
closed
8 months ago
1
How to load and initialize llama2 models downloaded from Huggingface
#143
microhu
closed
6 months ago
2
ore.exceptions.ResponseStreamingError
#142
AnnaYang2020
opened
8 months ago
1
Cannot use torch.compile with SAT
#141
lijing1996
opened
9 months ago
0
Rotary embedding
#140
leizhao1234
closed
10 months ago
0
Rotary embedding
#139
leizhao1234
closed
10 months ago
0
不支持流式dataset
#138
af-74413592
closed
6 months ago
2
Fail to load random states from checkpoints saved
#137
minkowski0125
opened
10 months ago
2
Fix params dtype bug
#136
Jintao-Huang
closed
10 months ago
1
fix lost bias when quantize from pre-trained model parameters
#135
jimmieliu
closed
8 months ago
3
fix lost bias when quantize from pre-trained model parameters
#134
jimmieliu
closed
10 months ago
1
ModuleNotFoundError: No module named 'SwissArmyTransformer'
#133
B-1368
opened
10 months ago
6
使用微调时,由于数据集过大,内存不够如何处理?
#132
Syno8
closed
11 months ago
1
请教一个问题,使用mp_size=2时的loss应该怎么写
#131
kunden0612
opened
11 months ago
1
模型并行的方式进行lora方式的finetuning要怎么设置呢
#130
kunden0612
opened
11 months ago
5
Next