NorwegianVeterinaryInstitute / BioinfTraining

Repository for bioinformatics training at the Norwegian Veterinary Institute
41 stars 14 forks source link

Moving to SAGA / NIRD #131

Closed evezeyl closed 4 years ago

evezeyl commented 4 years ago

Here I make a list of the things we have to check - figure out See also here for notes for moving to Saga and basic Saga info an use - for coworking session


Specific to data transfer:



evezeyl commented 4 years ago

Eve: Suggestion for rawdatafolder

was wondering if it would no be an idea on NIRD in wgs -> have data per species (ie when runs shared by different person - can be a challenge to have access to all data)

Thomieh73 commented 4 years ago

I have created the following in our projects folder on SAGA: (location: /cluster/projects/nn9305k )

The folder contains a README file with a brief explanation of the other files present in the folder.

Profile files

The files in the folder:

I copied both from my directory and added the umask 0002 option to both. In addition, I added a big line saying ### personal additions ### and below it people can add the things they like.

Things that can be added to the .bash_profile file are found in another file called:

In that file I have listed a few things, such as adding favorite modules to your general environment, coloring and modifying the prompt, and adding our own modules /software we install on our project.

SLURM files

Another thing you find in the sample_files folder is two example slurm scripts. these can be copiedf

Thomieh73 commented 4 years ago

I am have set-up conda on saga. It is installed in the folder: /cluster/projects/nn9305k/src/

First thing I did was the installer files for miniconda vs anaconda Miniconda 4.7.12 linux installed python 3.7: 68.5 Mb unpacked --> 440 Mb. Anaconda 2019.10 linux installer python 3.7: 506 Mb

That is a ten times difference, and these are compressed archive files. So I checked what is different between anaconda and miniconda. Basically the anaconda distribution contains about 150 preinstalled python packages, and the miniconda is without those packages. source: anaconda vs miniconda

So I will for now only set-up miniconda.

command used to install: bash Miniconda3-latest-Linux-x86_64.sh

The example file: bashrc was modified so that conda can be used with both the login window as well as the screen window.

As a test I have installed the ncbi genome download tool from Kai Blin . commands:

conda create -n ncbidown
conda activate ncbidown

conda install -c bioconda ncbi-genome-download

People should test if they can activate the environment and call the tool with:

 ncbi-genome-download -h

It is set-up now.

When deactivating ncbidown, I see that the base conda environment is still active. I need to deactivate that as well, and I thing it is wise to have that the base environment is not necessarily activated. I change that with the command:

conda config --set auto_activate_base false

That is reversible :-)

ajkarloss commented 4 years ago

@Thomieh73 , Just a thought. We are using miniconda for all the users in under nn9305k. If we install only miniconda, everytime when we create new conda envs we may need to install those missing python packages multiple times and increase the installation time also. Not sure how significant difference it will make. But it just came to my mind

evezeyl commented 4 years ago

Are not all packages within finds env isolated? Meaning each finds env has own depended installed and do not rely on finds base packages?

On Wed, 6 Nov 2019, 11:11 Jeevan Karloss, notifications@github.com wrote:

@Thomieh73 https://github.com/Thomieh73 , Just a thought. We are using miniconda for all the users in under nn9305k. If we install only miniconda, everytime when we create new conda envs we may need to install those missing python packages multiple times and increase the installation time also. Not sure how significant difference it will make. But it just came to my mind

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/NorwegianVeterinaryInstitute/BioinfTraining/issues/131?email_source=notifications&email_token=AK3TRDKGBWLNLQEOHMNF6H3QSKJ5XA5CNFSM4JHZQZF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDGAGAQ#issuecomment-550241026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK3TRDKVCOMQUSZTOS737TDQSKJ5XANCNFSM4JHZQZFQ .

Thomieh73 commented 4 years ago

Hey, conda makes a big library where all packages for each env are stored and unpacked.You can find these in a folder called pkgs.

When an environment is creatde and software is installed into an environment, the individual packages are stored in the pkgs folder and the enviroment bin folder. So when an other environment is in need of a specific package for the set-up conda will first check if it is already present in the pkgs folder. If it is not present it will download it.

Our current installation of conda on abel has a pkgs folder that has the size of 45 Gb. It contains about 5 different packages of samtools version 1.9, all with a different hash.

So we do not have to worry that miniconda repeatedly installs missing python packages. it will check if they are present.

ajkarloss commented 4 years ago

Thats nice :)

karinlag commented 4 years ago

We need to figure out how many CPUS/threads/cores each node has, and if we have something corresponding to the 16 CPU upper limit that was present on abel.

ajkarloss commented 4 years ago

https://documentation.sigma2.no/quick/saga.html

karinlag commented 4 years ago

@jeevan or @thomas (or @evezeyl if you remember), can you try to figure out what the highest rational number of cpus to use on saga would be? On abel the limit was 16, because that was all in one node, and using multiple nodes was apparently not recommended (at least not for our stuff).

evezeyl commented 4 years ago

Ok, I remembered something with 20 but really not sure (also did note 1 PC 40 tasks -> ? 20cores*2 sockets -> ? number of tasks? (kind of above my level still this HPC calculations ...)

-> so I poked a bit around in the nodes (If I understood well was those basic normal nodes). Would this correspond to cores per socket? will this help ? (bellow)

[saga@c1-31 ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0-79 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz Stepping: 4 CPU MHz: 2000.000 BogoMIPS: 4000.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 28160K NUMA node0 CPU(s): 0-19,40-59 NUMA node1 CPU(s): 20-39,60-79

They also mentionned during the course freepe specific command -> indication load, computing power nodes ... .... c6-4 24 of 64 cores free, 2725083 of 3093723 MB free, 24.0 PEs free (MIXED) c6-8 24 of 64 cores free, 2069723 of 3093723 MB free, 24.0 PEs free (MIXED) c3-44 40 of 40 cores free, 386446 of 386446 MB free, 40.0 PEs free (IDLE)

There we get the full list for all nodes ...

Looked fast in manual but did not find relevant info. So will need someone with a better understanding to look at that if this does not help Eve

evezeyl commented 4 years ago

Bifrost versions that I had noted last year: Find bifrost version of all -> I think was in listadapt

Details ... bifrost_complete_package_list.txt

karinlag commented 4 years ago

We've managed to move, closing.