MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

checkpointing #39

Closed kimh11 closed 3 years ago

kimh11 commented 3 years ago

Hi Mikkel,

Thanks for a great pipeline! I've conda installed paleomix on a cluster and running the bam pipeline. Sometimes when my run is interrupted, it starts from where it ended, but sometimes it re-run commands that finished and I'm not sure why.

Could you point me to information on how checkpointing on paleomix bam pipeline works?

Thank you!

MikkelSchubert commented 3 years ago

Hi HJ,

Paleomix works like make and should only (re-)run a step in two situations:

  1. If the files generated by the step do not exist, or
  2. If the files generated by a step are older than the input files to that step, i.e. if the input has changed since the step was last run.

What kind of tasks are you seeing re-run?

Next time a run is interrupted, you can try to run the BAM pipeline with the --list-output-files option. That will print a table of all files generated by the pipeline, where the first column represents the status of each file, one of "Ready", "Missing" (1), or "Outdated" (2). If something is going wrong, then you should see already created output files listed as "Outdated" or even as "Missing".

Cheers, Mikkel

kimh11 commented 3 years ago

Hi Mikkel,

Thank you so much! It was the second situation you listed. Some of my files had become outdated because I had updated the time on the files to avoid them being scratched. Totally my fault. All's back to the way it should be! Really appreciate your help!

Best, HJ