dib-lab / farm-notes

notes on the farm cluster
16 stars 9 forks source link

Managing home directories and project space on farm's disks #72

Open ctb opened 1 day ago

ctb commented 1 day ago

hackmd for editing: https://hackmd.io/XZ8N_yylQCyPxz5L1-pxdw?view

Managing home directories and project space on farm's disks

New accounts on farm follow a different model than those created in 2023 and before: they are allocated 20 GB of home directory space, and must use group-specific space to store files that are larger than that.

(Old accounts should move away from putting everything in their home directory, too, so the advice and approaches below are useful for everyone!)

Basic disk space commands

df -h ~/ will tell you how much disk space you have free (and how much is used) in your home directory. New-style home directories have a quota of 20 GB; older-style home directories will be shared with others.

df -h /group/ctbrowngrp will tell you how much disk space is used/free on /group/ctbrowngrp. You can run this command on any directory and it will also tell you what disk that directory is mounted on - for example,

% df -h /group/ctbrowngrp/sourmash-db
Filesystem                            Size  Used Avail Use% Mounted on
nas-6-0-ib:/nas-6-0/ctbrowngrp/group  220T  218T  2.1T 100% /group/ctbrowngrp

shows that the directory /group/ctbrowngrp/sourmash-db is on the disk mounted as /group/ctbrowngrp, and that this disk is stored on nas-6-0-ib - a "network attached storage" device.

du -sh /path/to/directory will tell you how much disk space is being used by that directory; you'll need read access to it.

General approaches

You can create your own directories in other places (e.g. /group/ctbrowngrp/DIRECTORY) and put files there. This will use that disk for everything under that directory.

1. Store data elsewhere and link into your home directory

As you grow the number of projects you are working on, it can be inconvenient to remember all the places where you put data. So, a good useful trick is to make an alias, or "symbolic link", that provides a reference to another location in your home directory.

For example, if you did

mkdir /group/ctbrowngrp/MYDIR
ln -s /group/ctbrowngrp/MYDIR ~/THE_DIR

then you could cd into & access files as ~/THE_DIR without typing out the full name. Note that both source name (MYDIR) and target name (THE_DIR) can be whatever you want.

2. Put custom conda environments elsewhere

Often a large amount of disk space is consumed by conda environments. You can also store these elsewhere. The simplest way to do this is simply to move your ~/.conda directory on to another disk and them symlink it as above.

3. Use project-specific directories

It is tempting to just create one work directory and do all your work under there. The problem with that is that (as your projects grow) it will be difficult to sort things out. So we recommend naming directories like so: YEAR-username-project.

If I want to use /group/ctbrowngrp4/ to store files for a horse genotyping project, for example, I would do:

mkdir /group/ctbrowngrp4/2024-ctbrown-horse-genotyping
ln -s /group/ctbrowngrp4/2024-ctbrown-horse-genotyping ~/

and then do all your work within that directory.

How ctbrowngrp disks work?

As of Sep 2024, ctbrowngrp4 is the place with free space for projects. Please use the YEAR-USERNAME-PROJECT approach above.

We're working on getting space cleaned up on the other disks ('ctbrowngrp', 'ctbrowngrp2', and 'ctbrowngrp3'), too.

Misc advice

Cleaning up after conda

conda doesn't do a great job of cleaning up after itself. You can do conda clean -a to get rid of temporary files.

ctb commented 1 day ago

ref https://github.com/dib-lab/farm-notes/issues/70