berman-lab / ymap

YMAP - Yeast Mapping Analysis Pipeline : An online pipeline for the analysis of yeast genomic datasets.
MIT License
6 stars 6 forks source link

Monitor dataset analysis for errors and save input files for a post-mortem #54

Open vladimirg opened 8 years ago

vladimirg commented 8 years ago

(This is a continuation of the quota system - see #40.)

  1. There should be an error-checking step immediately before the cleanup step. The script will verify that all output files are as they should be, and that there are no special errors in the logs (should be handled with caution, as the word 'error' does appear in the logs in a non-fatal situation at this point).
  2. If something went wrong, we should:
    1. Alert the user immediately (instead of relying on a timeout).
    2. Store the original input file(s). These can be text files - e.g. FASTQs or SAMs (and then they should be zipped), or binary files (zipped FASTQs or BAMs), and then they can be stored as-is. We should still delete all other intermediate files, as the original files are sufficient to re-run the analysis and get the same errors.
    3. Alert admins. The simplest option is writing to a special log that keeps dataset failures, which we will check every few days.
  3. If nothing went wrong, we remove the intermediate files as normal. One exception is the debug mode - in which all intermediates should be kept always.