Closed Qwlouse closed 8 years ago
I think nervanagpu also uses NHWC. I vote for switching.
Sounds like changing has more advantages than disatvantages, so the change is fine by me.
@untom Can we replace the Numpy conv implementation with a Cython one too then (probably forked from the same place)? It's probably not worth rewriting the version with fancy indexing for now.
It looks like the following changes are needed:
CUDNN_TENSOR_NHWC
from CUDNN_TENSOR_NCHW
.Is there anything else?
I tried using the conv implementation with a Cython one (the one used in cudarray), but it was much slower than the current im2col based one.
I started to make this change, but it looks like cuDNN v3 still does not support NHWC fully. In particular, the docs cudnnPoolingForward
show CUDNN_STATUS_NOT_SUPPORTED
for the case when The wStride of input tensor or output tensor is not 1.
There is a forum comment saying that cuDNN v4 will support it here: https://devtalk.nvidia.com/default/topic/875570/?comment=4680667. I'm not sure if this is official.
Our choices are:
I don't like NCHW. IMHO its just a poor choice. So my vote is still for switching to NHWC. The forum post indicates that the conv backward pass might also not work with NHWC, which might be a show-stopper. In that case we can maybe wait for the cuDNN v4 release. But other than that I suggest option 3.
Sooo.... I guess this is not actionable until cuDNN v4 is out? Do they have any release date? (in the linked devtalk thread it just says "soon" o_0 ).
If it's just about pooling, we can use our own CUDA kernels for it. If some aspects of conv are also not supported, then we have to wait until cuDNN v4.
BTW, there seem to be some cuDNN specific things like conv mode, pref, workspace etc. which we're not doing, and probably should. Also, do you know where is the latest cudnn wrapper code? Github appears out-of-date.
As far as I understood from the linked threads, convolution with NHWC is not supported in cuDNN right now.
Wrapper code: seems to be stuck at cuDNN v2, we'll have to update them ourselves (or roll our own) -- in which case I'd rather wait for cuDNN v4 instead of doing the same work twice.
The threads don't really appear to be official. As far as I can see in the cuDNN docs, only the pooling does not support NHWC. For conv, I know that at least the forward pass supports it (I tried to migrate to NHWC in a private branch), although I am not sure if all conv algos are supported. Also, I did not try to run the conv backward pass. The conv docs say nothing about NHWC not being supported though.
I'll try to find out when cuDNN v4 is expected to be out.
Any issues with switching the convention for convolution layer weights to NHWC as well?
I have finished this migration in: https://github.com/Qwlouse/brainstorm/tree/NHWC_format
Firstly, the CIFAR-10 example is now much slower on the GPU. I think this is expected due to the small network and image sizes. We should use prepared calls to the kernels for a small speed-up.
Secondly, the CIFAR-10 network learns slower now. I am not sure why this is. I get to a loss of ~1.25 in 10 epochs now, whereas with cuDNN it was around ~1.0 after 10 epochs. After 10 epochs learning remains slow with about 63% val accuracy after 20 epochs (vs. 71% for cuDNN+NCHW).
BTW, this branch also contains the SoftmaxCE layer, which replaces Classification layer in the helper tool for building classifiers as well.
Update It appears that using Classification instead of Fully connected + the new SoftmaxCE gives the same performance as cuDNN, which implies that the issue is in the usage of SoftmaxCE somewhere. This should be easy to fix.
Alright, I fixed the issue that was causing worse performance. This issue is basically solved now. Summary
We would like to make NHWC the standard branch going forward, but I'd like also keep around support for NCHW+cudnn v3 in a separate branch until cudnn v4 comes out. What should be the strategy?
I see two options:
I favor the 2nd.
I'm currently keeping the NHWC branch up to date by merging master into it. Doing an NHWC pre-release is fine, but it should keep most API changes. One thing that's missing from master but is in NHWC_format is SoftmaxCE layer.
Let's move the SoftmaxCE layer and associated changes to master, and finish any planned API changes soon (in a week). Then, as you said, we tag a release and merge the branch in. Okay?
Fine with me
I've cherry-picked the SoftmaxCE
layer from the NHWC branch.
Caution: The fix I later made to get_in_out_layers_for_classification has not been picked yet. The activation for the fc_layer needs to be set to 'linear' (it's relu by default). Also, I suggest renaming this tool to get_classifier_layers (or something like that).
Yay! Issue hijacking! :tada:
Ok I'll fix that. I've also just added a SigmoidCE
layer and did lots of copy&pasting for the tool. Maybe its time to unify the three tools we have...
Transition to NHWC has been finished and merged into master. NCHW is unsupported after the 0.4 release.
Support for cuDNN will return in a future release after cuDNN v4 is released with full support for NHWC.
The 0.4 release can be used if one wishes to use cuDNN v3 with brainstorm for NCHW format.
We just briefly discussed this and I think it would be good to change the buffer layout for our convolutional layers from 1 to 2:
(time, batch_size, channels, height, width)
(time, batch_size, height, width, channels)
Pros
time
,batch_size
,height
, andwidth
but not channels. And that you could then do with a single flatten operation.Cons