Closed taokz closed 2 years ago
For RevNet to work, you must ensure that twice as many features get passed in as each branch uses. Otherwise, RevLib can't split the input into two equal-sized inputs, one for each of the two branches.\
To do this, change the output channels of first
from hidden_size[0]
to hidden_size[0] * 2
.
Optionally, you could also tell RevLib to feed the same input into both branches by setting split_dim=None
when constructing the ReversibleSequential module.\
Keep in mind that not only do you need twice as many features in the input, but you also get twice as many features in the output. So, the final norm
and linear
both need to work on twice as many features as well.
Unfortunately, I'm unsure what you mean by "build a model in a forward
way." Could you elaborate?
@ClashLuke Thank you a lot for your reply.
Doubling the output channels of first
will raise RuntimeError
,
File ~\Anaconda3\lib\site-packages\revlib\core.py:183, in additive_coupling_forward(other_stream, fn_out)
181 fn_out = split_tensor_list(fn_out)
182 if isinstance(fn_out, torch.Tensor):
--> 183 return other_stream + fn_out
184 return [other_stream + fn_out[0]] + fn_out[1]
RuntimeError: The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 3
I've printed the output size in the feedward of the BasicBlock, layer 1 and layer 2 work well but I can not print the output of the layer 3.
There is the other problem due to the the doubling operation, the number of parameters of the RevNet will not equal to the the original ResNet. I may not fully understand this library and how RevNet works, and implement RevNet with revlib in a wrong way.
About the question build a model in a forward way
, I would like to know if I can stack revlib.ReversibleSequential()
, specifically,
layer 1 = self._make_layer(...); layer 2 = self._make_layer(...); layer 3 = self._make_layer(...)
rev_layers = nn.Sequantial (layer 1, layer 2, layer 3)
or
def forward(x):
out = layer 1 (x)
out = layer 2 (out)
out = layer 3 (out)
where ._make_layer(...) returns revlib.ReversibleSequential(*layers)
, which is different from the previous one --
self.rev_layers = revlib.ReversibleSequential(*[layer1, layer2, layer3]), where layer* is nn.Sequantial().
I've tried this in my previous RevNet20 code, and I got the same RuntimeError but the error happens in layer 2 instead of layer 3.
The top problem you faced is that RevNet requires all inputs and outputs to be the same size. As the second layer has more output features than the first, RevNet will have to add a tensor with 32 features to one with 16, which isn't possible.\ Think about it like in a ResNet. In ResNet, you only have the residual path within each resolution+feature size, but not across them. To get the residual stream across, you usually use downsampling (such as AvgPool2d) and add its output to the output of your "residual" block. In RevNet, the second thing doesn't exist. Instead, you would have to use PixelShuffle and feature padding to arrive at a similar result (see #2).
The easiest way forward would be to have multiple ReversibleSequential modules, one for each _make_layer()
-call, and put these into a standard nn.Sequential
-container. This is how the original RevNet did it. Their method uses marginally more parameters but otherwise gives the same results:
Another alternative would be to avoid this multi-stage assembly and construct one large ReversibleSequential module instead. Using one large block saves memory, and i-RevNet documented how they achieved marginally worse ImageNet accuracy with this kind of architecture:
Yes, you can define the reversible architecture in forward
. However, I'd advise against it, as ReversibleSequential is a thin wrapper around things you have to do anyway.\
If you want to do what ReversibleSequential would usually handle for you, you'd have to wrap your modules in ReversibleModules like so:
https://github.com/HomebrewNLP/revlib/blob/34dad19318e2f861ea6b0ce263506625a934b568/revlib/core.py#L471-L487
and call these modules one-by-one, just like in a normal nn.Sequential module: https://github.com/HomebrewNLP/revlib/blob/34dad19318e2f861ea6b0ce263506625a934b568/revlib/core.py#L509-L511
Thanks a lot for your detailed explanation! Now I can understand why my code does not work.
I also read the source codes of revnet and i-revnet, and they provide a downsampling to match the dimension of other_stream
and fn_out
(the mismatch is caused by the changing of number of channels). If I do not misunderstand, it seems that your code does not provide this feature, right? Do you have a plan to add this feature?
Sorry, I'm not planning to add these, as the most common functions (pooling, pixelshuffle, upsample) are already part of PyTorch.
Hi
I would like to use this library to build a ResNet20 model, I've tried several times but I still have the mismatched dimension error. My model is shown as follows:
I've tried to modify
self.in_planes = 8
andhidden_size = [8, 16, 32]
, respectively, but it still does not work. Could you provide any hints? Is it possible to build a model in aforward
way instead of wrapping reversible model with non-reversible layers likemodel = nn.Sequential(conv, rev_layer, conv)
? I appreciate your help.