chapmanb / cloudbiolinux

CloudBioLinux: configure virtual (or real) machines with tools for biological analyses
http://cloudbiolinux.org
MIT License
257 stars 158 forks source link

tar -p option causes quota issues in case quota are enforced on group level #349

Closed vdejager closed 4 years ago

vdejager commented 4 years ago

I'm encountering issues installing the bcbio pipeline on a HPC infrastructure where quota are strictly enforced on the group level AND file system location level.

all files in (for example) /projects/mygroup should have the user:mygroup attributes. Files with any other attributes cause a 'quota reached' error. While this is not an issue for most parts of the bcbio installation script I found the installation of genomes is throwing this error, even though only 190 GB is used of my 50TB quota

My installation script:

#!/bin/bash

BASE=/<obfuscated path>
VERSION="1.2.0"

# fetch installer
wget https://raw.githubusercontent.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py

# make directories
mkdir -p $BASE/$VERSION/tools

# run installer
python bcbio_nextgen_install.py ${BASE}/${VERSION}/bcbio \
      --tooldir=${BASE}/${VERSION}/tools \
      --genomes GRCh37 \
      --aligners bwa \
      --aligners star \
      --isolate \
      --cores 4

I could trace this back to the tar -p option used in : https://github.com/chapmanb/cloudbiolinux/blob/21e8b0db701dddbef4a0b9060cc0cdf012373b79/cloudbio/biodata/genomes.py#L115

and related places.

Manually removing all the "-p" options from the cloudbiolinux code in the tmpbcbio-install/cloudbiolinux/cloudbio/biodata/genomes.py file resolved the issue. Restarting the installation finished the install properly.

any suggestions to streamline the process are welcome

roryk commented 4 years ago

Thanks-- the intention behind the -p option is to make sure the genomes are set up with proper everyone-can-look permissions, otherwise we'd end up debugging a bunch of issues related to incorrect permissions being set for the shared genomes. So I think removing it is going to lead to more problems for new users getting started, so your workaround is the way I'd go for now for your particular system. It's the first time we've seen a problem with it. It is definitely not ideal for your setup though, sorry about that.

vdejager commented 4 years ago

Thanks, no problem.i’ve been able to resolve for now with a bit of effort. Ill ask the sysadmins for a more permanent solution.

Op vr 20 mrt. 2020 om 15:45 schreef Rory Kirchner notifications@github.com

Thanks-- the intention behind the -p option is to make sure the genomes are set up with proper everyone-can-look permissions, otherwise we'd end up debugging a bunch of issues related to incorrect permissions being set for the shared genomes. So I think removing it is going to lead to more problems for new users getting started, so your workaround is the way I'd go for now for your particular system. It's the first time we've seen a problem with it. It is definitely not ideal for your setup though, sorry about that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chapmanb/cloudbiolinux/issues/349#issuecomment-601737272, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRXHZ6C5DLUR3FQML6Z25DRIN6QJANCNFSM4LQJIVUQ .

roryk commented 4 years ago

Thanks for being understanding, Vic. Hopefully the sysadmins can relax your quota. On shared systems where we are managing bcbio for other users, what we usually do is have a bcbio user that has permissions to do the installation and write to the genome directories and what not. Maybe they could set you up with something like that with relaxed quota.