dib-lab / elvers

(formerly eelpond) an automated RNA-Seq workflow system
https://dib-lab.github.io/elvers/
Other
28 stars 3 forks source link

meta issue for high-level error checking #110

Open ctb opened 5 years ago

ctb commented 5 years ago

see https://github.com/dib-lab/eelpond/issues/103#issuecomment-471185817 for initial motivation --

I think we need a modular way to do high-level correctness checking.

e.g.,

I don't think run_eelpond should have this error checking in it directly, tho! Maybe we could put in something that when a particular rule file is included, it has some high level checks that it runs, or maybe that should be connected in some way to the higher level workflows mentioned in pipeline_defaults.yaml?

bluegenes commented 5 years ago

I think in the eelpond_params section of the params.yml file, we can add a require parameter that describes the required rules. Include utility rules like get_data, etc. Need to have an "or" option in place though, for situations wither either assemblyinput or assembly are required.

Not sure yet how to check if something has already been run (e.g. trimmomatic). Maybe don't check, but add a help section to the eelpond_params that has a brief description of the workflow & its required components. Would be helpful to run elvers examples/nema.yml assembly -h to return this help description to stdout.

bluegenes commented 5 years ago

the require idea outlined above would involve updating requirements with exact rules that exist (e.g. right now salmon requires either get_reference or trinity, but in the future, other assemblers may work).

To get around this, maybe we instead create input/output categories that go in each params.yml files. When running a workflow, we check that all inputs are satisfied, and if not, print a list of all rules or utilities that provide that output. For example, if we need 'transcriptome", we have two rules that produce that, get_reference and trinity, and we can print a helpful message to suggest the user provide either rule.

something like this?

salmon:
  inputs:
      read:
        - raw
        - trimmed
      reference:
        - transcriptome
  outputs:
    read:
      - counts
deseq2:
  inputs:
      read:
        - counts
      reference:
        - transcriptome
  outputs:
    base:
      - diffexp