MGXlab / virus_identification_tools_benchmarking

virus identification tools benchmarking
0 stars 0 forks source link

wtp pipeline integration #6

Open lingyi-owl opened 3 years ago

lingyi-owl commented 3 years ago

created a branch called wtp to integrate wtp pipeline into the current pipeline. I think one of the problems of integrating wtp is the output structure. wtp output data structure is like:

.
├── output
│   ├── 2-643079_scaffolds
│   │   ├── identified_contigs_by_tools
│   │   └── raw_data
│   ├── 2-643091_scaffolds
│   │   ├── identified_contigs_by_tools
│   │   └── raw_data
│   ├── literature
│   │   └── Citations.bib
│   └── runinfo
│       ├── execution_report.html
│       ├── execution_report.html.1
│       ├── execution_timeline.html
│       └── execution_timeline.html.1
└── workdir

this is different from the current viralbench pipeline output structure: ├── results │   ├── 2-643079_scaffolds │   │   ├── hmmsearch │   │   └── wtp │   ├── 2-643091_scaffolds │   │   ├── hmmsearch │   │   └── wtp

I am not sure how to solve these conflicts in data structure.

papanikos commented 3 years ago

I don't think we should do anything special with the output. We have reults/wtp and keep the wtp structure.

This means:

  1. Collect all input scaffolds in one directory (symlinking should work) results/wtp/input
  2. Use that directory as input to wtp. Since itself is a pipeline parallelizaing over its input we should benefit from that.
  3. outputdir is just results/wtp/output and whatever

If you would really like to have things separate, per sample, one way I can think of is:

  1. Once wtp finishes successfully -the tricky part- we collect all of its desired output and move it to the corresponding sample dir.

(4) would mess with any wtp reruns. If things are moved, every new run would trigger all samples rerunning, which is unecessary. If we symlink we avoid that. But it feels a bit weird, tbh

papanikos commented 3 years ago

Opened #10 . I would suggest if you merge it we cleanup all wtp issues and start new ones (that are inevitable).