cambiotraining / hpc-intro

Practical course on running jobs on a HPC
https://cambiotraining.github.io/hpc-intro/
Other
15 stars 14 forks source link

Quiz for HPC usage best practices #2

Closed tavareshugo closed 3 years ago

tavareshugo commented 3 years ago

Use this issue to compile questions that could be used in an interactive quiz to discuss best uses of the HPC. These could be used with the first session (intro to HPC) or in the session when we talk about Cambridge HPC more specifically.

tavareshugo commented 3 years ago

Some yes/no questions:

qiwang7 commented 3 years ago

For the last one, easier to understand if put into detail: for example, if a workflow has two steps, the first step uses 40G of memory, 1 CPU and the 2nd step can be parallelized to 8 CPUs and take just a few hundred Mb memory. How much resources would you ask for this?

Option 1: 40G and 8 CPU Option 2: 42G and 8 CPU Option 3: 40G and 1 CPU Option 4: Others

The best answer is others (option 4): best to split to two jobs/scripts: the first one asks for 40G(42G?) mem and 1 CPU, the 2nd one asks for 1G mem and 8 CPU.

I'm not sure to ask for 42G or 40G, as I always gives it a bit of buffer. Not sure whether it is the right thing to do, or the right place to talk about it.

One of the most frequently asked question is how much memory to ask for... I attempted an answer here: https://wiki.cam.ac.uk/plantsci-bioinfo/Condor_User_Guide#How_much_memory_I_should_request_for_my_job.3F

qiwang7 commented 3 years ago

Q: If I accidentally delete one file from the cluster, can if recover it?

Option 1: Yes Option 2: No Option 3: it depends

The correct answer is option 3. then the trainer can elaborate on it. I guess it answer has something to do with the frequency of the backup?

qiwang7 commented 3 years ago

This one is not that important, just for people who think this way :)

Q: If one of the hard disk on the cluster break and my data is on the disk, what will happen?

Option 1: I will loose my data Option 2: I won't loose my data Option 3: it depends

The correct answer is option 3. then the trainer can elaborate on it. I guess it answer it depends on whether the cluster has redundancy setup.

tavareshugo commented 3 years ago

Q: If I accidentally delete one file from the cluster, can if recover it?

From what we are teaching, the answer would be "No". On the university HPC's working space (called "rds", which we're calling "scratch" on the course), I don't think you can recover a file if you delete it. But I like the question because it lets us discuss that different HPC maybe have different kind of setup.

Q: If one of the hard disk on the cluster break and my data is on the disk, what will happen?

I like this one as well, it let's us discuss the difference between redundancy in the storage compared to a true backup with some snapshots that let you "travel back in time" :)