galaxyproject / cloudman

Easily create and manage compute clusters on any Cloud.
https://galaxyproject.org/cloudman/
37 stars 23 forks source link

Root volume space utilization increases during file uploads #48

Closed hackdna closed 5 years ago

hackdna commented 8 years ago

/etc/nginx/sites-enabled/galaxy.locations contains upload_store /mnt/galaxy/upload_store;. However there are no files present in /mnt/galaxy/upload_store during file uploads and the use of the root volume increases:

ubuntu@ip-172-31-15-47:/etc/nginx$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G   12K  1.9G   1% /dev
tmpfs           377M  8.5M  369M   3% /run
/dev/xvda1       20G  5.9G   13G  32% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            1.9G     0  1.9G   0% /run/shm
none            100M     0  100M   0% /run/user
cm_processes    1.9G     0  1.9G   0% /run/cloudera-scm-agent/process
/dev/xvdf       200G  9.8G  191G   5% /mnt/galaxy
/dev/xvdg        80G   65G   16G  81% /mnt/galaxyIndices

ubuntu@ip-172-31-15-47:/etc/nginx$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G   12K  1.9G   1% /dev
tmpfs           377M  8.5M  368M   3% /run
/dev/xvda1       20G   11G  8.6G  55% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            1.9G     0  1.9G   0% /run/shm
none            100M     0  100M   0% /run/user
cm_processes    1.9G     0  1.9G   0% /run/cloudera-scm-agent/process
/dev/xvdf       200G   11G  190G   6% /mnt/galaxy
/dev/xvdg        80G   65G   16G  81% /mnt/galaxyIndices

This is potentially problematic because the total size of the root volume is 20GB and genomic data files can be up to 10GB each or more.

Setting client_body_temp_path to /mnt/galaxy/upload_store allows to avoid filling up the root partition but this may or may not be the right solution.

afgane commented 8 years ago

So this only happens when doing an upload via the API (browser or FTP properly use /mnt/galaxy/upload_store). The upload_file_from_url() method works (as you mentioned in #47) but it uses /mnt/galaxy/tmp for staging the data (which might be coming from here https://github.com/galaxyproject/cloudman/blob/8db478a8d777bcc48ed8c838f1f9de8519a11e60/cm/conftemplates/nginx_galaxy_locations.default#L58). I'm not sure that matters all that much but it would be nice to be certain why it's different than the other two upload options.

But does setting client_body_temp_path actually work for you? For a 5GB file, I keep getting the error you reported in issue #47.