floydhub / dl-docker

An all-in-one Docker image for deep learning. Contains all the popular DL frameworks (TensorFlow, Theano, Torch, Caffe, etc.)
https://www.floydhub.com
3.86k stars 821 forks source link

"No space left on device" #21

Open ANDRO90 opened 8 years ago

ANDRO90 commented 8 years ago

The building of the image stopped because of this error, and I don't understand where it comes from, since I'm installing everything in a AWS p2 instance with 61GB of ram and 100GB of storage (only 15% of which occupied after the crash). Any guess? I was building the gpu image, after the execution of all the preliminary steps (Nvidia drivers and Docker)

[ 81%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMaskedDouble.cu.o /usr/local/cuda/include/driver_functions.h(133): catastrophic error: error while writing generated C++ file: No space left on device

1 catastrophic error detected in the compilation of "/tmp/tmpxft_00002201_00000000-8_THCTensorMaskedFloat.cpp4.ii". Compilation terminated. /usr/local/cuda/include/texture_fetch_functions.hpp(7739): catastrophic error: error while writing generated C++ file: No space left on device

1 catastrophic error detected in the compilation of "/tmp/tmpxft_00002208_00000000-8_THCTensorMaskedDouble.cpp4.ii". Compilation terminated.

:0:0: fatal error: when writing output to : No space left on device compilation terminated. CMake Error at THC_generated_THCTensorMaskedFloat.cu.o.cmake:267 (message): Error generating file /root/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMaskedFloat.cu.o make[2]: **\* [lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMaskedFloat.cu.o] Error 1 make[2]: **\* Waiting for unfinished jobs.... CMake Error at THC_generated_THCTensorMaskedDouble.cu.o.cmake:267 (message): Error generating file /root/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMaskedDouble.cu.o make[2]: **\* [lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMaskedDouble.cu.o] Error 1 CMake Error at THC_generated_THCTensorSortDouble.cu.o.cmake:267 (message): Error generating file /root/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortDouble.cu.o make[2]: **\* [lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortDouble.cu.o] Error 1 :0:0: fatal error: when writing output to : No such file or directory compilation terminated. CMake Error at THC_generated_THCTensorTopK.cu.o.cmake:267 (message): Error generating file /root/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorTopK.cu.o make[2]: **\* [lib/THC/CMakeFiles/THC.dir/./THC_generated_THCTensorTopK.cu.o] Error 1 make[1]: **\* [lib/THC/CMakeFiles/THC.dir/all] Error 2 make: **\* [all] Error 2 !Error: Build error: Failed building.
saiprashanths commented 8 years ago

This is likely because you are running out of space allocated for Docker. Try removing some docker images and containers and retrying.

Note: the below commands will remove ALL containers and images. If you want to only remove select ones, change the command appropriately.

# Delete all containers
docker rm $(docker ps -a -q)
# Delete all images
docker rmi $(docker images -q)
ANDRO90 commented 8 years ago

Thanks for the reply, but it didn't work. I even tried to enlarge the Base Device Size (sudo dockerd --storage-opt dm.basesize=30G) from 10G to 30G but the building process crashed at the very same point.

Mr-Grieves commented 8 years ago

Hi,

I am getting the same error. After my build crashed I listed the currently running docker images and there were 24 of them, totalling over 60GB (my linux partition only has 60GB allocated to it).

How much space is required to build the gpu image?

Is there a way to reduce the number/size of docker images used in the build?

pkgvrpdm commented 8 years ago

I am also facing same problem. fedora 24, 12GB RAM, 16GB Swap, no other app running.

[ 78%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/generated/THC_generated_THCTensorMathCompareInt.cu.o /home/pawan/torch/extra/cutorch/lib/THC/generated/../THCTensorInfo.cuh:276:0: fatal error: when writing output to : No space left on device } ^ compilation terminated. nvcc error : 'cicc' died due to signal 11 (Invalid memory reference) nvcc error : 'cicc' core dumped CMake Error at THC_generated_THCTensorSortLong.cu.o.cmake:267 (message): Error generating file /home/pawan/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortLong.cu.o

saiprashanths commented 8 years ago

Sorry, running behind on issues tracking. Travelling now, will take a look next week and update.

An easy solution might be to just setup auto build in Docker Hub, so you can just download the image and don't have to build locally. I'll have to modularize the Dockerfiles to make that happen since Docker has CPU/time limits on the builds. Will do this in the next week. Apologies for the delay.

AnkurJain10 commented 7 years ago

Hey all, So I am facing the same issue. How did you guys manage to fix this or is there a work around?

Mr-Grieves commented 7 years ago

I never actually got this specific build working, but I ran into similar space issues with other Docker applications.

My solution was to move Docker's container storage directory (default is /var/lib/docker/ on Ubuntu) to another, larger partition.

I followed the symlink method described here: https://forums.docker.com/t/how-do-i-change-the-docker-image-installation-directory/1169

tastyminerals commented 7 years ago

In case it is a problem with small /tmp partition.

mkdir $HOME/tmp
export TMPDIR=$HOME/tmp

Then, don't forget to remove ~/tmp

AaronDelaplane commented 7 years ago

I was able to fix this issue by clicking "Remove all data" I got to this pop-up by clicking on the docker icon at the top of my computer screen

screen shot 2017-09-05 at 11 01 20 am