Closed tachikoma777 closed 2 years ago
The paper says "It starts with a learned constant of size C1616, denoted as F0, where C is the channel size. But what's the value should be? is it the same for every image?
It is a learnable parameter of size C x 16 x 16
.
Sorry if I bother you, but i still dont know how it is calculated, could you be more specific? Thanks a lot!
There is F0 as network input, may i ask how it is used?