Open crockpotveggies opened 5 years ago
Machine specs:
OK, so. This is also present on 1.0.0-beta3 (i.e., also seeing around 10GB memory used). Here's memory report (without CuDNN), which provides some insight: https://gist.github.com/AlexDBlack/2ffc2a9de0fd5fc5727af05f531bb937
WS_LAYER_WORKING_MEM at 3.71GB seems excessive. (This is working memory required for the layer with the largest working memory) WS_ALL_LAYERS_ACT at 2.01 GB is about 2x what the estimated "Total Activations Memory" (929.98 MB) requires, which also seems off. It should be about the same.
As for with cuDNN enabled: we're looking at a peak of a bit under 7GB total (1GB used by OS, so more like 6GB). WS_LAYER_WORKING_MEM is obviously very low. No difference for WS_ALL_LAYERS_ACT. https://gist.github.com/AlexDBlack/e02b99d259f610de61163640d9602a05
"Use CuDNN" is an obvious first step/workaround. Still looking into workspaces size.
Running an updated example using gradient sharing and VGG 16 from zoo. With a batch size of 16, zoo model uses 10GB of memory from defaults. This happens when isolated from ParallelWrapper. Have also tried setting WorkSpaceMode and CacheMode.
Try running this commit: https://github.com/deeplearning4j/dl4j-examples/blob/83a84f90ee0c9fd107c662bea74a5d578ce9322a/dl4j-cuda-specific-examples/src/main/java/org/deeplearning4j/examples/multigpu/vgg16/MultiGpuVGG16TinyImageNetExample.java
Aha! Link: https://skymindai.aha.io/features/DL4J-36