attention-mechanisms Search Results

1000+ results
for attention-mechanisms

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Modalities/modalities #162

Disable Flash Attention for inference

Flash Attention can only be used with fp16 and bf16, not with fp32. Therefore, we should make flash attention optional in our codebase so that one can deactivate it during inference in exchange for hi…

rrutmann updated 1 month ago
1
ztxz16/fastllm #233

Request to Support FlashAttention and Multi-Query Attention …

I hope this message finds you well. First off, thank you for providing such an incredible project on large model inference. I've been utilizing it extensively and it's been instrumental for many of my…

junior-zsy updated 11 months ago
2
Ranjitkm2007/SwinPLT #1

Consulting about the complete code

Dear author, thank you for your excellent work. I would like to inquire when you plan to make all your code publicly available. I am looking forward to your reply. Thank you!

shenhai911 updated 1 week ago
3
ROCm/MIOpen #2199

Request to Add Attention Kernel to MIOpen

Attention mechanisms are widely used in deep learning models, particularly in large language models. And a flexible attention kernel can help users to build accelerated language models conveniently on…

zjchen77 updated 3 months ago
4
2JONAS/In2SET #1

The relationship between mask.mat and mask_3d_shift.mat

Your work is commendable, demonstrating . The attention to detail and the clarity of your findings are truly impressive. I was particularly intrigued by the utilization of "mask.mat" and "mask_3d_shif…

Saitamaonefist updated 2 months ago
1
LeapLabTHU/Agent-Attention #33

Inquiry About Integrating Agent Attention into xformers Lib…

Dear Dr. Han and Dr. Ye, I have been greatly impressed by your work on the Agent Attention model, as detailed in your recent publication and the associated GitHub repository. The method of integrat…

XCZhou520 updated 2 months ago
1
ultralytics/ultralytics #14682

when resuming rtdetr-l,The map value stays at 0

The break occurs when I train the rtdetr-l model with 300 epoches going to 90, but when I use resume, the epoches start at 91 but the map,R and P value become 0 and stay at 0 。the training code as fol…

super-song-sir updated 1 week ago
3
JackAILab/ConsistentID #31

fyi - I asked Claude Opus to reachitecture the paper to find…

``` Yes, the approach presented in the ConsistentID paper could potentially be rearchitectured to find better solutions. Here are a few ideas for improving the architecture and methodology: **Inte…

johndpope updated 3 weeks ago
2
Anees-ur-Rehman-1/DevOps-repo #10

Ai chatgpt Algorithms

ChatGPT is based on the GPT-3 architecture, which is a transformer-based language model that uses self-attention mechanisms to generate text. The model is trained on a large corpus of text data using …

Anees-ur-Rehman-1 updated 1 year ago
1
huggingface/text-embeddings-inference #261

Support gte-Qwen1.5-7B-instruct

### Model description Here is the model description > gte-Qwen1.5-7B-instruct is the latest addition to the gte embedding family. This model has been engineered starting from the [Qwen1.5-7B](https:…

reverland updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for attention-mechanisms

1000+ results
for attention-mechanisms