ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
736 stars 134 forks source link

Metaspades not able to restart from checkpoint when switching servers -- looks for reads in wrong place #666

Open StevenJRobbins opened 3 years ago

StevenJRobbins commented 3 years ago

Dear Spades team,

I am attempting to run metaspades on the Australian National computing Infrastructure (NCI), but because of wall time restrictions there I am running metaspades on our local servers until they run out of memory and then my intent is to restart from the last checkpoint on the NCI servers with more memory. To do this I simply copy the output directory and reads onto the NCI server and attempt to use --restart-from.

However, when I try using the "--restart-from last" option I get an error message saying that metaspades can't find the reads because it's looking in directories that exist on our local servers where the run started, but that do not exist on the NCI server. --restart-from unfortunately cannot take new read locations as input (i.e. flags -1 & -2). Even if I do not specify the full path to the reads in my command, the full path appears to be stored by Metaspades. Is there a file that I can modify with the correct path to the reads on the NCI server to get the assembly to restart?

More importantly, it would be extremely beneficial in subsequent spades versions to be able to restart assemblies on a new server. That would appear to mean being able to amend the location of the reads so Spades doesn't complain. Please make spades assemblies more portable.

Thank you for your time.

Steven Robbins Australian Centre for Ecogenomics

asl commented 3 years ago

Dear Steven,

Thank you for your interest in SPAdes.

You can correct the paths to reads in the input_dataset.yaml file. However, this is the scenario we are intentionally do not want to support as there are many issues that might arise from incompatible paths, wrong file order, version mismatch, etc. We simply do not have bandwidth for implementing additional consistency checks, which are plenty.

StevenJRobbins commented 3 years ago

Thank you, Anton. I appreciate this little hack. Just to be clear though, if I run metaspades this way, will the metaspades assembly pipeline finish correctly? For example, I suspect if it's stored the full path to the reads it may also have stored the full path to the output directories. So will it still try to write output files to the directory structure from the previous server or will it be smart enough to find the correct new path?

asl commented 3 years ago

And... you've correctly outlined the next issue. If paths are different, then you're out of luck. You'd need to edit more configuration files to ensure that all intermediate results are still there, the output dir as well as scratch dirs are correct.

StevenJRobbins commented 3 years ago

Thanks again, Anton. For anyone reading in the future, I successfully got metaspades to run on the new server by running a find replace on all files in all subdirectories as follows:

sed -i "s|old_path_to_metaspades_output_dir|new_path_to_metaspades_output_dir|g" sed -i "s|old_path_to_metaspades_output_dir|new_path_to_metaspades_output_dir|g" / sed -i "s|old_path_to_metaspades_output_dir|new_path_to_metaspades_output_dir|g" //

asl commented 3 years ago

This may or may not work depending where the scratch dir is :)

StevenJRobbins commented 3 years ago

Thanks Anton. It worked in my test and successfully output the necessary files so looks like a hack solution, at least in my case.