NorwegianVeterinaryInstitute / Bifrost

Other
2 stars 1 forks source link

Figure out how to save data on userwork instead of disk #25

Closed karinlag closed 5 years ago

karinlag commented 5 years ago

We are running out of disk space on work, so we need to figure out where to have bifrost run to ensure that things go full.

karinlag commented 5 years ago

So, options are:

  1. /projects This is a no, because this area is slow
  2. /work This is the area we should use, but that area goes full, since we only have 10TB.
  3. $SCRATCH From https://www.uio.no/english/services/it/research/hpc/abel/help/user-guide/data.html While a job runs, it has access to a temporary scratch directory on /work. The directory is individual for each job, is automatically created, and is deleted when the job finishes (or gets requeued). There is no backup of this directory. The name of the directory is stored in the environment variable $SCRATCH, which is set within the job script. In general, jobs should copy their work files to $SCRATCH and run there. This is especially important for I/O intensive jobs. The scratch disk is faster than the home directory disk, and running I/O intensive jobs in $HOME slows down not only the job, but also interactive work for other users.

This is bad for us because this is individual per job, and deleted when the job finishes. We would like to keep building on old results, if they are applicable.

  1. $USERWORK Will explain below why that is a yes.
karinlag commented 5 years ago

So, about $USERWORK: From this webpage: https://www.uio.no/english/services/it/research/hpc/abel/help/user-guide/data.html

All users also have access to a directory /work/users/username, where username is the user's user name. The purpose of the directory is to stage files that are needed by more than one job. Files in this directory are automatically deleted after a certain time. Currently, they are deleted after 45 days, but that can change in the future. There is no backup of files in /work/users/. Please note that this area is not meant for storage of data. If attempts to avoid the automatic deletion are detected, counter measures will be applied, which could include removing the user's area alltogether.

Thus: the way nextflow works is to softlink between directories inside of work. Thus this is the best, because I can work in one directory on /work (which we should), also all the linking happens within one disk drive (which is what nextflow likes).

karinlag commented 5 years ago

Solved.