bentsherman / tesseract

A tool for creating resource prediction models for scientific workflows
MIT License
10 stars 2 forks source link

Use local scratch for MPI jobs #12

Closed bentsherman closed 4 years ago

bentsherman commented 4 years ago

When running applications on Palmetto it is always best to use local scratch over the network drives, however this requires some extra work with multi-node MPI jobs. In this case the input files must be copied to the local scratch of each node, but the nextflow scratch directive will only copy input files to the master node.

I think we can make the scratch directive work in this case by adding the following:

beforeScript: "for node in \$(cat $PBS_O_NODEFILE | uniq); do ssh \$node cp \$INPUT_FILE \$TMPDIR; done"

Or something like that...

bentsherman commented 4 years ago

Yeah so both HemeLB and KINC need the input files to be available to all processes, I should have realized this earlier for KINC.

In this case it doesn't seem to be worth it, better to have the input files on scratch1 or home. Copying to local scratch doesn't really save you anything if you're only going to read once at the beginning anyway.

As for output files, they should be written to scratch1 during process execution and then nextflow will copy them to home when the process finishes.

So all in all I don't think we need to worry about using scratch for any of our MPI applications. Maybe the other bioinformatics pipelines like GEMmaker and TSPG.