-
In this [link](https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side), they say that decoder architectures should have left padding. In the code repository, you do right padding at …
-
![image](https://github.com/user-attachments/assets/6ce92a4f-c546-4003-845b-3eba928a93c3)
the checkpoint you are trying to load type GOT but Transfomers does not recognize this architecture.
不知道…
-
Why are the item embeddings loaded with pre-trained parameters?
in translation_condition2.ipynb, the function:
def load_pretrained(self):
path_dict = {
'toy': 'saved/SASRec…
-
您好,荣幸拜读您的工作,我想请问训练耗时如何,在源代码batch size = 32 的设置下跑2000 itr 需要20分钟,我想要提升速度然后设置了 batch size = 192 耗时没有改善,是transfomer本身训练缓慢吗?还是我忽略了一些关键点呢?
-
VLLM 0.6.2 had just released few hours ago, it said no support multi image inference with Qwen2-VL.
I've try it, but it require the newest transformer and automatic install it.
When I start it u…
-
We want Calibrators and Transformers to be trivially configurable and thus swappable. To do this, we need to make ``Transfomer.Config.parameters`` optional such that only some config fields are popula…
-
I found the work to be really interesting, I'm trying to replicate the work for non-transformer architectures like CNN's. Any suggestions on how to proceed would be appreciated. Thanks in advance.
-
Loading a vicuna13B using 4bit quantization from the transformers library is possible [load_in_4bit](https://huggingface.co/docs/transformers/main_classes/quantization). How difficult could be for Fas…
-
-
### Your current environment
vllm version: 0.4.0
HF transfomers versaion: 4.39.2
model: https://huggingface.co/mistralai/Mixtral-8x7B-v0.1
GPU: V100 32G*4
### 🐛 Describe the bug
Hi , Iam…