batch-normalization-fuse Search Results

300 results
for batch-normalization-fuse

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

cqu20160901/yolov8pose_onnx_tensorRT_rknn_horizon #2

请教一下，在那个位置修改SiLU为ReLU？

class Conv(nn.Module): """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation).""" default_act = nn.SiLU() # -------------------------是这里吗？--…

jerryandjune updated 1 year ago
2
onnx/tensorflow-onnx #1929

Turning off back-to-back optimizer does not disable fusing b…

**Describe the bug** Hi, I was converting CenterNet([CenterNet HourGlass104 512x512](centernet_hg104_512x512_coco17_tpu-8) from Tensorflow Object Detection API(https://github.com/tensorflow/models/bl…

Mypathissional updated 2 years ago
6
yanghr/BSQ #1

Is this a complete model quantization process?

Hi, hanrui, I am very interested in the ideas of this paper, but I have a question as following: In general, a complete model quantization includes 1. Prepare a pretrained model; 2. Fuse the batch…

Ironteen updated 3 years ago
1
ROCm/TransformerEngine #76

[DDP 8xMI300X] GPT2-1.5B FP8 is 25% slower than BF16 & OOMs …

### Problem Description Even with `NVTE_USE_HIPBLASLT=1` & Installing TE while inside the container instead of through `Dockerfile` as suggested by https://github.com/ROCm/TransformerEngine/issues/…

OrenLeung updated 1 month ago
3
g-h-anna/ultralytics4channel #2

Colab modification

I am using colab how can i use your modified yolo ? can i lone your repo and install ?

A7med01 updated 3 months ago
8
ROCm/TransformerEngine #79

[FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs…

### Problem Description Llama3 8B FP8 OOMs at the same batch size as BF16. I need to decrease the batch size to `2` for it to not OOM. At batch size 2, TE FP8 is **21% slower** than torch compile B…

OrenLeung updated 2 weeks ago
6
ROCm/TransformerEngine #78

[FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dum…

### Problem Description On Llama3 70B Proxy Model, the training stalls & gpucore dumps. The gpucore dumps are 41GByte per GPU thus i am unable to send it. Probably easier for yall to reprod this er…

OrenLeung updated 1 week ago
24
WongKinYiu/yolov7 #975

What's the recommended way of local custom model inference -…

I plan to use a custom trained model in a local environment without network access. What's the best way to inference saved model -via `model = torch.hub.load(...)` or `model = attempt_load('…

OleksiiYeromenko updated 1 year ago
1
triton-lang/triton #900

Implement BatchNorm in triton

batchnorm(bn) is very popular in CV, almost every conv op will be followed by bn. I see [layernorm](https://triton-lang.org/master/getting-started/tutorials/05-layer-norm.html#) in triton achieved bes…

Jack47 updated 8 months ago
3
isl-org/MiDaS #69

TFlite conversion

Dear Ranftl, Your new small model was converted from Pytorch to TFlite directly. It seems that this method doesn't need converting to onnx and PB first. Can you please elaborate on how to realize i…

ljun901527 updated 3 years ago
9

上一页 1...1 2 3 4 5 6 7...30 下一页

300 results for batch-normalization-fuse

300 results
for batch-normalization-fuse