JierunChen / FasterNet

[CVPR 2023] Code for PConv and FasterNet
695 stars 55 forks source link

A PConv followed by a PWConv well approximates the regular Conv 3 × 3,is it similar in speed #1

Closed wsy-yjys closed 1 year ago

wsy-yjys commented 1 year ago

I read your paper. It was great! And I have a question, the paper says “A PConv followed by a PWConv well approximates the regular Conv 3 × 3” , I'm very curious are they similar in speed, thank you~

JierunChen commented 1 year ago

Thanks for your interest in this work. A PConv followed by a PWConv well approximates and is $2\sim7\times$ faster than a regular Conv 3 × 3. The speedup varies for different feature map sizes and devices, as shown in the following table:

Operator
$\times10$ layers
Feature map size FLOPs (M) Throughput
($\times1000$ fps)
on GPU
Latency (ms)
on CPU
Latency (ms)
on ARM
Conv $3\times3$ $96\times56\times56$
$192\times28\times28$
$384\times14\times14$
$768\times7\times7$
2601 3.01
4.89
4.56
3.16
35.67
28.41
31.85
62.71
780
620
595
662
PConv 3$\times$3 with $r=\frac{1}{4}$
followed by a PWConv
$96\times56\times56$
$192\times28\times28$
$384\times14\times14$
$768\times7\times7$
451 6.97
9.58
11.43
10.46
7.88
5.42
6.86
8.34
167
133
116
121
PConv 3$\times$3 with $r=\frac{1}{4}$ $96\times56\times56$
$192\times28\times28$
$384\times14\times14$
$768\times7\times7$
162 15.39
20.51
37.21
43.24
4.74
2.38
3.32
3.95
81.70
62.69
48.19
47.52
PWConv (Conv $1\times1$) $96\times56\times56$
$192\times28\times28$
$384\times14\times14$
$768\times7\times7$
289 12.75
17.98
16.49
13.79
3.14
3.04
3.54
4.39
85.07
70.50
67.38
73.58
wsy-yjys commented 1 year ago

You are very careful. Thank you very much