BeckResearchLab / barrelseq

Bacteria & archaea, isolate and simple community RNA-Seq analysis pipeline
MIT License
1 stars 0 forks source link

Improve error handling when files don't exist, e.g. in engine run #3

Open dacb opened 5 years ago

dacb commented 5 years ago

In an environment without some of the paths listed in the config, e.g. /work/software directory, engine run will throw an error that should probably be caught and handled. E.g.,

(barrelseq) [dacb@D-10-19-48-52 ~/work/barrelseq]$ ./shim.py engine run --config-file /tmp/mytoy/config.yaml 
config file validation succeeded
CompletedProcess(args='/work/software/samtools/bin/samtools faidx /tmp/mytoy/data/reference.fasta', returncode=127, stderr=b'/bin/sh: /work/software/samtools/bin/samtools: No such file or directory\n')
Traceback (most recent call last):
  File "./shim.py", line 10, in <module>
    barrelseq.command_line.main()
  File "/Users/dacb/work/barrelseq/barrelseq/command_line.py", line 545, in main
    args.func(args)
  File "/Users/dacb/work/barrelseq/barrelseq/engine/run.py", line 116, in run
    if len(step) < args.processes:
TypeError: '<' not supported between instances of 'int' and 'NoneType'

This maybe part of a larger question about where and when these options should be validated. Probably in barrelseq/config.py:validate?

dacb commented 5 years ago

Handled by https://github.com/BeckResearchLab/barrelseq/commit/56bf9429c5241da6c6794a5e7f4458754733673c#diff-887fdaa62ac183543d9080bb224bc9de

Does not currently check if executables have the +x bit set.

dacb commented 5 years ago

Probably… By extension or by content? The former is easier but can be onerous. The latter can be hard to do.

-- David A. C. Beck, Ph.D. dacb@uw.edu http://faculty.washington.edu/dacb/ Research Associate Professor, Dept. of Chemical Engineering Director of Research & Senior Data Science Fellow - eScience Institute Adjunct Associate Professor of Paul G. Allen School of Computer Science & Engineering Adjunct Research Associate Professor, Env. and Occ. Health Sciences Associate Director - NRT Data Intensive Research Enabling Clean Technologies University of Washington, Seattle

On Sep 21, 2018, at 1:26 PM, mpesesky notifications@github.com wrote:

Do we currently (or want to) check file type (ie. fastq, fasta, gff)?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/BeckResearchLab/barrelseq/issues/3#issuecomment-423660865, or mute the thread https://github.com/notifications/unsubscribe-auth/AGL1-gMmgQ9XVOxa48xrhWlJZG1OAe_uks5udUtpgaJpZM4WzPeV.

mpesesky commented 5 years ago

I put that in the wrong place, but since we're talking about it – I was thinking content. We could avoid it and just pass along the errors from the constituent programs as they run into format issues.

dacb commented 5 years ago

Can we leverage biopython for this vs reinventing our own validators?