copying to scratch as a separate job

xeniorn commented 3 years ago

Hi,

is it possible, or could it be implemented, that relion_refine can be told to copy to scratch only, without continuing with the rest of the job?

For a very large dataset, copying to scratch can easily take 1-2 hours. Still, in our case, in the end the job can run faster even with this 1-2 h upfront cost, as iterations themselves are faster.

But during these 1-2 hours, the relion job is not using the gpus or the cpus or the memory allocated to it by the cluster, since it's all a part of the same submission. I would like to do the copy to scratch as one submission (using appropriate resources), followed by actual processing with resources appropriate for that step. That way waste of resources would be lowered, with the only downside being a slightly increased complexity of submission (but this will not be handled by the user but by our submission script).

A simple --stop_after_scratch flag would do. Alternatively, it can be a separate binary that handles this for all jobs.

If not like this, I could only do this via a dirty workaround, like running a normal job with insufficient resources, but running it in a way that ensures that the job will fail after scratch (e.g. by providing a faulty input).

Or modifying the source code to implement this functionality for our copy of relion only - which I would probably do if there is no broader interest for this.

Best

J

biochem-fan commented 3 years ago

How do you ensure your "copy" job and actual computation job go into the same node? Until a previous job finishes, you don't know which node will be free next.

I agree that current scratch system is not very efficient but we don't have time to refactor this right now. It also depends too much on your cluster setup. In LMB, scratch space is wiped every time a job finishes. So your strategy will not work and cannot be tested.

xeniorn commented 3 years ago

Thanks for the answers

1) I see the situation might be more suitable for our and similar setups - we have a distributed parallel access scratch, and each computation node has a high-bandwidth access to it, So if I copy something to scratch from anywhere, any computational node will have access to it.

2) alright, I understand how this would not be a priority, as it's purely about performance, not universally applicable, and even where it is, the effect is not dramatic

3) thanks for the link - I saw it already in the mailing list though. The point about contiguous micrographs is already fine since it's from relion. The suggestion for manually copying to scratch I thought of before, but I find it to be more dirty/dangerous than the other workaround methods

If anything along these lines, in our system where scratch lives for quite long and is shared, I'd sooner put the entire relion folder on scratch and just rsync the results to the storage after each job. That way there would be no delay for computation, and the storage part can be carried out asynchronously. Risk factor being only using the computations since the last rsync finished.

Just out of curiosity, if your scratch is always wiped post-job at LMB, does it mean you can't make use of "--reuse_scratch" in relion at all? And if a job fails due to low mem, timeout, or anything, you would need to re-copy the data to scratch (or not use it)?

biochem-fan commented 3 years ago

distributed parallel access scratch

Is this fast enough when accessed from many nodes?

does it mean you can't make use of "--reuse_scratch" in relion at all?

Correct. Most people don't use it. And this is dangerous for novice users who don't know when this option is justified.

Advanced users reserve a full node by an interactive job, ssh into it and do whatever they want, potentially using RELION scheduler to chain multiple jobs. In this case they can use --reuse_scratch.

xeniorn commented 3 years ago

Is this fast enough when accessed from many nodes

As far as I can see it in practice, yes. Perhaps if everyone was running an IO-heavy job at the same time, it might be an issue, but this never happens. It's a medium sized cluster, where only some jobs will have IO pressure, and and the IO pressure from different jobs is not happening exactly at the same time. So I can imagine having a larger shared scratch allows better utilization of read/write capacity than having dedicated scratch storage. But a cluster that is more like a federation of isolated nodes (with a typical usage of 1 user = 1 full node) is much easier to set up and maintain, that's for sure - I assume that's why our setup is not as common. But I would be quite convinced that once set up, it's more efficient in terms of resource utilization and average job speed, possibly even individual job speed.

3dem / relion

copying to scratch as a separate job #719