-
nvFuser generated code for a fusion block present in DiT has worse than expected performance. The subgraph is performing a `LayerNorm + + Mul + Add + Add` computation as shown in the code below. nvFus…
-
## Introduction
I am an engineer currently working on 3D model parallelism for transformers. When the tensor model parallelism (https://github.com/huggingface/transformers/pull/13726) is done, I am g…
-
There is an architecture I would like to quantise and retrain from its floating point counterpart. I would like to incorporate the merge_bn operation supported by Brevitas. How exactly would I do this…
-
### 🚀 The feature, motivation and pitch
LayerNorm starts to be applied to image data on per-channel basis (e.g. in ConvNeXt model).
`torch.nn.LayerNorm` support normalization only on the last se…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…
-
**System information**
- TensorFlow version (you are using): 2.5.0
- Are you willing to contribute it (Yes/No): No
**Describe the feature and the current behavior/state.**
I haven't found a …
-
These are the results of the current segnet implementation:
![horse](https://cloud.githubusercontent.com/assets/1780466/25223935/2d0c605e-25bd-11e7-8a0a-cd23f793f32e.png)
![horse-segnetfix](https://…
-
Hi all, I am super new to Dali Backend, but the benchmark I recently ran are incredible. I am trying to preprocess the data with Dali backend, but struggle with the "normalize" operation. I would like…
-
Hi,
I trained your code on Imagenet-1k from scratch with your config file (mobilevit-small) with only one change: a new batch size of 32/GPU with an effective batch size of 32*4. I get top-1 accurac…
-
Great job, you give the best quantization accuracy as I know.
I'm very interested in your paper and code, but I have some issue about this paper.
As far as my knowledge.
For a full quantized model,…