THUDM SwissArmyTransformer issues

THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

https://THUDM.github.io/SwissArmyTransformer

Apache License 2.0

871 stars 84 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output'

#179 deep-practice opened 15 hours ago
0
请问断点续训应该如何设置

#178 elesun2018 opened 3 months ago
6
transfer_param.py 转换vincuna hf模型成sat模型报错

#177 Lunatic-Solar opened 3 months ago
17
How to install a model to the right path?

#176 link89 closed 3 months ago
1
NO cogagent？

#175 Mac0q opened 4 months ago
2
ModuleNotFoundError: No module named 'localAttention'

#174 BlueSkyyyyyy opened 4 months ago
0
“No backend type associated with device type cpu” when run cli_demo_sat.py

#173 yileld opened 4 months ago
5
如果想绕过deepspeed做finetune，可以在train的时候直接model.step()来实现吗？

#172 cocoshe opened 5 months ago
1
Using CogVLM - KeyError (MODEL_URLS) - Google Colab

#171 Baggiorobertozoba closed 5 months ago
1
MixtralMlpMixin()这个函数里面moe只是计算专家的logits但是没看到分发逻辑

#170 AlenjandroWang opened 5 months ago
1
AutoModel.from_pretrained()里面不能加载hf版本的权重吗

#169 AlenjandroWang opened 5 months ago
1
AutoModel.from_pretrained()里面不能加载hf的权重吗

#168 AlenjandroWang closed 5 months ago
0
怎么从断点恢复微调训练

#167 zoumaguanxin opened 5 months ago
1
MoE support

#166 1049451037 closed 5 months ago
0
fix rotary bug when q seqlen > cos seqlen

#165 leizhao1234 closed 5 months ago
0
support chatglm rotary in triton

#164 leizhao1234 closed 6 months ago
0
请问针对样本数量不均衡的数据集怎么做样本均衡呢

#163 lln556 opened 6 months ago
1
Questions about your LoRA codes

#162 miznchimaki closed 6 months ago
7
deepspeed 分布式训练 loss nan or inf

#161 JohnTang93 opened 6 months ago
1
Is sat suuport saving checkpoint by using fp16 or bf16?

#160 xxxwuwq opened 6 months ago
5
add accumulate ema and fix fp32 weight bug

#159 leizhao1234 closed 6 months ago
0
单机多卡训练时内存占用过高

#158 zodiacg closed 6 months ago
2
SwissArmyTransformer可以读bin权重文件吗？visualglm-6b项目里就没见pt文件，只有bin。难以微调

#157 qq577288254 closed 5 months ago
5
fix zero3 check

#156 Sleepychord closed 6 months ago
0
fix model parallel inconsistent init

#155 Sleepychord closed 6 months ago
0
update ema

#154 leizhao1234 closed 7 months ago
0
support MoE & Mixtral-8x7b

#153 1049451037 closed 5 months ago
0
fix profiling

#152 leizhao1234 closed 7 months ago
0
merge main to glu

#151 1049451037 closed 7 months ago
0
add profiling

#150 leizhao1234 closed 7 months ago
0
deepspeed分布式训练出现sat ValueError inconsistent

#149 elesun2018 opened 7 months ago
1
How to embed video encoder module from pytorch？

#148 zyhzyh88 opened 7 months ago
3
mqa cross & stream chat

#147 1049451037 closed 8 months ago
0
Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture?

#146 tiendung closed 8 months ago
3
请问如何使用hf加载icetk_glm_130B的tokenizer和GLM130B的模型？

#145 Ajay-Wong closed 8 months ago
6
FileLock - out of date?

#144 taziksh closed 8 months ago
1
How to load and initialize llama2 models downloaded from Huggingface

#143 microhu closed 6 months ago
2
ore.exceptions.ResponseStreamingError

#142 AnnaYang2020 opened 8 months ago
1
Cannot use torch.compile with SAT

#141 lijing1996 opened 9 months ago
0
Rotary embedding

#140 leizhao1234 closed 10 months ago
0
Rotary embedding

#139 leizhao1234 closed 10 months ago
0
不支持流式dataset

#138 af-74413592 closed 6 months ago
2
Fail to load random states from checkpoints saved

#137 minkowski0125 opened 10 months ago
2
Fix params dtype bug

#136 Jintao-Huang closed 10 months ago
1
fix lost bias when quantize from pre-trained model parameters

#135 jimmieliu closed 8 months ago
3
fix lost bias when quantize from pre-trained model parameters

#134 jimmieliu closed 10 months ago
1
ModuleNotFoundError: No module named 'SwissArmyTransformer'

#133 B-1368 opened 10 months ago
6
使用微调时，由于数据集过大，内存不够如何处理？

#132 Syno8 closed 11 months ago
1
请教一个问题，使用mp_size=2时的loss应该怎么写

#131 kunden0612 opened 11 months ago
1
模型并行的方式进行lora方式的finetuning要怎么设置呢

#130 kunden0612 opened 11 months ago
5