Closed wsy-yjys closed 1 year ago
Thanks for your interest in this work. A PConv followed by a PWConv well approximates and is $2\sim7\times$ faster than a regular Conv 3 × 3. The speedup varies for different feature map sizes and devices, as shown in the following table:
Operator $\times10$ layers |
Feature map size | FLOPs (M) | Throughput ($\times1000$ fps) on GPU |
Latency (ms) on CPU |
Latency (ms) on ARM |
---|---|---|---|---|---|
Conv $3\times3$ | $96\times56\times56$ $192\times28\times28$ $384\times14\times14$ $768\times7\times7$ |
2601 | 3.01 4.89 4.56 3.16 |
35.67 28.41 31.85 62.71 |
780 620 595 662 |
PConv 3$\times$3 with $r=\frac{1}{4}$ followed by a PWConv |
$96\times56\times56$ $192\times28\times28$ $384\times14\times14$ $768\times7\times7$ |
451 | 6.97 9.58 11.43 10.46 |
7.88 5.42 6.86 8.34 |
167 133 116 121 |
PConv 3$\times$3 with $r=\frac{1}{4}$ | $96\times56\times56$ $192\times28\times28$ $384\times14\times14$ $768\times7\times7$ |
162 | 15.39 20.51 37.21 43.24 |
4.74 2.38 3.32 3.95 |
81.70 62.69 48.19 47.52 |
PWConv (Conv $1\times1$) | $96\times56\times56$ $192\times28\times28$ $384\times14\times14$ $768\times7\times7$ |
289 | 12.75 17.98 16.49 13.79 |
3.14 3.04 3.54 4.39 |
85.07 70.50 67.38 73.58 |
You are very careful. Thank you very much
I read your paper. It was great! And I have a question, the paper says “A PConv followed by a PWConv well approximates the regular Conv 3 × 3” , I'm very curious are they similar in speed, thank you~