-
## Description
I want to measure the performance of the model, so I want to know the number of parameters and FLOPs.
Is there any tool that can calculate the flops and params of the TensorRT engine?…
-
### 问题确认 Search before asking
- [X] 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.
### 请提出你的问题 Please ask your question
请问怎么计算rt-detr的参数量和计算量。您在README中提到的方法无法正确执行,我想详细…
-
### Problem Description
Hi AMD team,
When trying to do FP8 Training on MI300X, it is extremely slower due to extremely high cpu overhead taking up more than 81% of the time. As you can see from …
-
**Describe the bug**
Multiplication by constant is not well optimized if output width is the same as arguments' widths. It is possible to write addition with shifts in loop that is faster.
**To Repro…
-
It's a remarkable work. However, I had some problems in the reproduction. The paper reports 0.62M counts and FLOPS:9.7G at 1024x512 scale, but the model code provided for testing has 0.61M counts and …
-
Currently, large models rank the highest on the leaderboard. I believe it would be ideal to have a plot like [this](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/speed_performan…
-
Hi @Artanic30 ,
I saw in your paper that you have done model efficiency analysis. Can you tell me the GFLOPS for your model? Can you also provide the code for computing GFLOPS?
-
### 🐛 Describe the bug
We found that the flops counter is reporting incorrect flops number for sdpa operations.
This issue is not in torch 2.4+cu121 release.
Repro code:
```
from torch.ut…
-
# URL
- https://arxiv.org/abs/2411.04996
# Authors
- Weixin Liang
- Lili Yu
- Liang Luo
- Srinivasan Iyer
- Ning Dong
- Chunting Zhou
- Gargi Ghosh
- Mike Lewis
- Wen-tau Yih
- Luk…
-
would it be possible to register operations such as na2d using `torch.library.custom_op`,
or otherwise ensure that they participate in operation dispatch?
torch's built in flop counter, FlopCoun…