-
Greetings Wenjie,
I was very much impressed aby your work "SAITS". I am trying to create an attention-based model on my own as a part of my Bacholer's project and I have a few questions to ask:
I wa…
-
### System Info
- `transformers` version: 4.41.2
- Platform: Linux-6.5.0-27-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
Why does this error occur?
How do I solve this?
$ python zero_shot.py
/home/cr/miniconda3/envs/backdoor_Medclip/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The par…
-
### System Info
OS version: WSL 2. ubuntu 22.04
model: llama3-8B-Instruct
Hardware: no GPU
There is no gpu, but I installed the nvcc library in wsl using this command. `sudo apt install nvidia…
-
In the paper, the ablation study about attention emb and gen is interesting.
Are these models all different models using each attention?
Can I select causal attention for both cases when using G…
-
Following the paged attention [paper](https://arxiv.org/pdf/2309.06180), add cuda kernels for the Llama model. Cuda kernels for the Llama architecture have been widely implemented in the open source c…
-
### System Info
- `transformers` version: 4.41.1
- Platform: Linux-5.15.0-1055-aws-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.0
- Safetensors version: 0.4.3
…
-
hi, I want to use the examples/pytorch/language-modeling/run_clm.py to train my model. But I find that the only way to use flash_attention is to modify the code in run_clm.py like:
```python
…
-
Tried (ubuntu) to torch.save (1.1.0) model using Linear Attention (0.4.0) and got the following serialization error:
`PicklingError: Can't pickle : attribute lookup on fast_transformers.feature_maps…
-
Let's look at the code directly:
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2").to('cpu').eval()
model.encode(['test'], convert…