-
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # -------------------------是这里吗?--…
-
**Describe the bug**
Hi, I was converting CenterNet([CenterNet HourGlass104 512x512](centernet_hg104_512x512_coco17_tpu-8) from Tensorflow Object Detection API(https://github.com/tensorflow/models/bl…
-
Hi, hanrui, I am very interested in the ideas of this paper, but I have a question as following:
In general, a complete model quantization includes
1. Prepare a pretrained model;
2. Fuse the batch…
-
### Problem Description
Even with `NVTE_USE_HIPBLASLT=1` & Installing TE while inside the container instead of through `Dockerfile` as suggested by https://github.com/ROCm/TransformerEngine/issues/…
-
I am using colab how can i use your modified yolo ? can i lone your repo and install ?
-
### Problem Description
Llama3 8B FP8 OOMs at the same batch size as BF16. I need to decrease the batch size to `2` for it to not OOM. At batch size 2, TE FP8 is **21% slower** than torch compile B…
-
### Problem Description
On Llama3 70B Proxy Model, the training stalls & gpucore dumps. The gpucore dumps are 41GByte per GPU thus i am unable to send it. Probably easier for yall to reprod this er…
-
I plan to use a custom trained model in a local environment without network access.
What's the best way to inference saved model -via
`model = torch.hub.load(...)` or
`model = attempt_load('…
-
batchnorm(bn) is very popular in CV, almost every conv op will be followed by bn. I see [layernorm](https://triton-lang.org/master/getting-started/tutorials/05-layer-norm.html#) in triton achieved bes…
-
Dear Ranftl,
Your new small model was converted from Pytorch to TFlite directly. It seems that this method doesn't need converting to onnx and PB first. Can you please elaborate on how to realize i…