-
[Flash attention 3](https://tridao.me/blog/2024/flash3/) makes use of new features of the Hopper architecture.
- (async) WGMMA
- TMA
- overlap softmax
Are these all things that can currently (…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussi…
-
### Feature request
after torch compiling the whisper.text_decoder model, the inference time is crazy low !. Thank you for the work !
however the warm up time is very long since it needs to go thr…
-
![image](https://github.com/Anddd7/architecture-diagram/assets/24785373/8f4e54b5-fdbb-4c7f-8177-e5772b950c25)
-
No matter whether I load the local model or the gpt2-imdb model from huggingface, the following error is reported:
`
ValueError: GPTModelBranch does not support an attention implementation through t…
-
**Short Description**
I would like to add the architecture described in the paper mentioned below.
**Papers**
A lightweight deep learning model for automatic segmentation and analysis of opht…
-
### Problem Description
Composable Kernel currently only contains code to support fused attention (FA2) on RDNA3(+) architectures in the forward direction. This greatly increases the VRAM requirement…
-
-
### Motivation.
As a continuation to #5367 - as this merge request was rejected and I have to maintain my own fork to support this scenario, I suggest we should add support in vLLM for model architec…
-
1.Revitalizing optimization for 3d human pose and shape estimation: A sparse constrained formulation(2021)
code:No
2.Body meshes as points(2021)
regared as a two class classification task(if a grid…