attention-model Search Results

1000+ results
for attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

jessevig/bertviz #133

Bug when visualizing T5 models with generate

I tried to visualize the attention maps for the T5 model but have encountered issues while getting the plots. I would like to emphasize few points: - I have used `model.generate` because I don't …

salokr updated 1 month ago
1
Vaibhavs10/insanely-fast-whisper #210

Getting `Use model.to('Cuda')` when trying to use Flash Atte…

I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use `insanely-fast-whisper --file-name audio.…

eburgwedel updated 1 month ago
1
ollama/ollama #5390

deepseek-coder-v2-lite flash attention not enabled

### What is the issue? As soon as the context length limit is reached for deepseek-coder-v2-lite the models are just repeating previous answers and keeps looping itself even after asking for somethin…

reddev-aroy updated 1 week ago
3
pytorch/TensorRT #2848

🐛 [Bug] Unable to freeze tensor of type Int64/Float64 into c…

## Unable to freeze tensor of type Int64/Float64 into constant layer, try to compile model with truncate_long_and_double enabled When I try to test the Transformer Attention layer with tensorRT, I g…

supermeng updated 1 month ago
4
hirofumi0810/tensorflow_end2end_speech_recognition #23

models/attention/decoders/attention_layer.py

In the class AttentionLayer (models/attention/decoders/attention_layer.py), the initial parameter does not include "sigmoid_smoothing". But, in models/attention/attention_seq2seq.py, it calls Att…

cuhkebook updated 6 years ago
1
wangyuchi369/LaDiC #3

eval results make no sense

使用您预训练的权重去进行语义推断，发现生成的句子是一些杂乱无章的单词拼凑而成，不知道是什么原因。 ![image](https://github.com/wangyuchi369/LaDiC/assets/118409611/864c7c12-e761-4874-8fb4-dbb74e5ce758) 我的操作步骤是将bert-base-uncased的模型文件下载到本地，然后修改对应的模型路径…

findlet39 updated 3 weeks ago
5
JeCase/LoadElectricity_Forecasting_CNN-BiLSTM-Attention #1

有部分代码不懂什么意思

请问这个是什么意思？可以怎么解决？ # Load the trained model model = load_model('/content/drive/MyDrive/Model/CNN-BiLSTM-Attention_Model_Final.h5', custom_objects={"cc": cc, "Attention": Attention}) # Load the his…

xq17307331726 updated 1 month ago
3
huggingface/diffusers #8785

adding PAG Support for Hunyuan-DIT and Pixart-Sigma

we recently added PAG support for SDXL. Is Anyone interested in extending PAG support to Hunyuan-DIT and Pixart-Sigma? There is no implementation available, so it is a bit of a research-oriented pro…

yiyixuxu updated 4 days ago
3
NVIDIA/TensorRT-LLM #1875

an engine is slower when sharded on a10g

Hi, I'm running on aws a10g and I'm trying to perform some benchmarking of different setups. I tried to shard the model to 2 gpus to make it faster but I'm getting the same latency. Does this make…

tonylek updated 6 days ago
3
lshqqytiger/ZLUDA #23

is Xformers with ZLUDA possible?

i compiled ZLUDA ```Finished `release` profile [optimized] target(s) in 5m 40s``` i dowloaded `nccl` from NVIDIA and placed it inside of the ZLUDA directory `P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0…

unclemusclez updated 2 days ago
7

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for attention-model

1000+ results
for attention-model