attention-model Search Results

1000+ results
for attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

WenjieDu/SAITS #40

Training stage of an attention based model!

Greetings Wenjie, I was very much impressed aby your work "SAITS". I am trying to create an attention-based model on my own as a part of my Bacholer's project and I have a few questions to ask: I wa…

kalluarjun69 updated 1 week ago
1
InternLM/xtuner #815

关于训练的一些疑惑

感谢贵团队的贡献！最近在使用xtuner训练，但是遇到了一些问题。 1、一些参数的含义不是很清楚，有没有针对每个参数的说明文件呢？ 2、我进行sft，但是跑起来后step和我手动算的对不是，config： ``` # Copyright (c) OpenMMLab. All rights reserved. from peft import LoraConfig f…

Zheng-Jay updated 5 hours ago
1
huggingface/transformers #31787

Transformer models are not deterministic when using Flash At…

### System Info - `transformers` version: 4.41.2 - Platform: Linux-6.5.0-27-generic-x86_64-with-glibc2.35 - Python version: 3.10.12 - Huggingface_hub version: 0.23.4 - Safetensors version: 0.4.…

YunfanZhang42 updated 3 days ago
3
huggingface/text-generation-inference #2144

Could not import Flash Attention enabled models: cannot impo…

### System Info OS version: WSL 2. ubuntu 22.04 model: llama3-8B-Instruct Hardware: no GPU There is no gpu, but I installed the nvcc library in wsl using this command. `sudo apt install nvidia…

Hhhh8 updated 4 days ago
1
ContextualAI/gritlm #45

Can I select causal attention for retrieval embeddings when …

In the paper, the ablation study about attention emb and gen is interesting. Are these models all different models using each attention? Can I select causal attention for both cases when using G…

Yangseung updated 4 days ago
17
atoma-network/atoma-paged-attention #1

Add paged attention kernels for the Llama model architecture

Following the paged attention [paper](https://arxiv.org/pdf/2309.06180), add cuda kernels for the Llama model. Cuda kernels for the Llama architecture have been widely implemented in the open source c…

jorgeantonio21 updated 2 weeks ago
7
huggingface/diffusers #8803

duplicate assignment for hidden_states at models/attention_p…

### Describe the bug duplicate assignment for hidden_states at models/attention_processor.py:1142 ### Reproduction N/A ### Logs _No response_ ### System Info N/A ### Who can help? _No respons…

131131yhx updated 1 day ago
1
fishaudio/fish-speech #347

[BUG]AutoDL镜像推理报错 Exception in thread Thread-2

用的fish-speech 1.1的官方镜像。没有微调lora，代码块内容： ```shell #将xxxxxx.ckpt改为你想要推理的模型 #可能需要等一两分钟分钟 %cd ~/autodl-tmp/workdir/fish-speech !python -m tools.webui \ --llama-config-name dual_ar_2_codebook_la…

klx1204 updated 1 day ago
1
huggingface/transformers #31741

Can I use "attn_implementation" in model config file

hi, I want to use the examples/pytorch/language-modeling/run_clm.py to train my model. But I find that the only way to use flash_attention is to modify the code in run_clm.py like: ```python …

hanwen-sun updated 3 days ago
4
idiap/fast-transformers #114

Can't officially save Linear Attention model

Tried (ubuntu) to torch.save (1.1.0) model using Linear Attention (0.4.0) and got the following serialization error: `PicklingError: Can't pickle : attribute lookup on fast_transformers.feature_maps…

maulberto3 updated 1 week ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for attention-model

1000+ results
for attention-model