The way the Reshape layer is currently implemented (sharing gradient in the Reshape() function and providing empty stubs for Forward() and Backward() functions) differs from the way Flatten is implemented, where the gradient and data are declared as shared during every call of the forward()/backward() implementation.
Changing this for Reshape layer in the same way it is implemented for the older Flatten layer seems to make the issue observed in #6769 disappear.
It should be noted that this has been implemented without properly understanding why the previous implementation failed in the first case, just based on the example provided by the Flatten layer.
A much simpler change, just changing
top[0]->ShareDiff(*bottom[0]);
to the more intuitive
bottom[0]->ShareDiff(*top[0]);
but leaving it in the Reshape function did NOT fix the issue
see #6769
The way the Reshape layer is currently implemented (sharing gradient in the Reshape() function and providing empty stubs for Forward() and Backward() functions) differs from the way Flatten is implemented, where the gradient and data are declared as shared during every call of the forward()/backward() implementation.
Changing this for Reshape layer in the same way it is implemented for the older Flatten layer seems to make the issue observed in #6769 disappear.
It should be noted that this has been implemented without properly understanding why the previous implementation failed in the first case, just based on the example provided by the Flatten layer.
A much simpler change, just changing
to the more intuitive
but leaving it in the Reshape function did NOT fix the issue