Public-Health-Bioinformatics / cpo-pipeline

An analysis pipeline for the purpose of investigating Carbapenemase-Producing Organisms.
MIT License
1 stars 2 forks source link

Decide on output directory structure #9

Closed dfornika closed 5 years ago

dfornika commented 5 years ago

The pipeline output sub-directories should be self-explanatory and unambiguous.

dfornika commented 5 years ago

I'm considering inverting our current directory structure a bit. Still thinking this over, but considering something as shown below.

I'd call this an 'output-by-sample' approach.

The top-level of the output directory you have one directory per sample:

.
├── SAMPLE-01
├── SAMPLE-02
└── SAMPLE-03

Within that you have one directory per analysis type:

.
├── SAMPLE-01
│   ├── assembly
│   ├── resistance
│   └── typing
├── SAMPLE-02
│   ├── assembly
│   ├── resistance
│   └── typing
└── SAMPLE-03
    ├── assembly
    ├── resistance
    └── typing

Then below that you have tool-specific output files.

.
├── SAMPLE-01
│   ├── assembly
│   │   ├── post-assembly_qc
│   │   ├── pre-assembly_qc
│   │   └── shovill
│   │       ├── contigs.fa
│   │       └── shovill.log
│   ├── resistance
│   └── typing
│       ├── mlst
│       │   └── SAMPLE01.mlst
│       └── mob_recon
│           ├── contig_report.txt
│           └── mobtyper_aggregate_report.txt
├── SAMPLE-02
│   ├── assembly
│   │   ├── post-assembly_qc
│   │   ├── pre-assembly_qc
│   │   └── shovill
│   │       ├── contigs.fa
│   │       └── shovill.log
│   ├── resistance
│   └── typing
│       ├── mlst
│       │   └── SAMPLE02.mlst
│       └── mob_recon
│           ├── contig_report.txt
│           └── mobtyper_aggregate_report.txt
├── SAMPLE-03
│   ├── assembly
│   │   ├── post-assembly_qc
│   │   ├── pre-assembly_qc
│   │   └── shovill
│   │       ├── contigs.fa
│   │       └── shovill.log
│   ├── resistance
│   └── typing
│       ├── mlst
│       │   └── SAMPLE03.mlst
│       └── mob_recon
│           ├── contig_report.txt
│           └── mobtyper_aggregate_report.txt
dfornika commented 5 years ago

The way we currently do things is roughly like this, which I'd call an 'output-by-analysis-type' approach.

.
├── assembly
│   ├── SAMPLE-01
│   │   ├── contigs.fa
│   │   ├── post-assembly_qc
│   │   └── pre-assembly_qc
│   ├── SAMPLE-02
│   │   ├── contigs.fa
│   │   ├── post-assembly_qc
│   │   └── pre-assembly_qc
│   └── SAMPLE-03
│       ├── contigs.fa
│       ├── post-assembly_qc
│       └── pre-assembly_qc
├── resistance
│   ├── SAMPLE-01
│   │   ├── SAMPLE-01.cp
│   │   └── SAMPLE-01.rgi.txt
│   ├── SAMPLE-02
│   │   ├── SAMPLE-02.cp
│   │   └── SAMPLE-02.rgi.txt
│   └── SAMPLE-03
│       ├── SAMPLE-03.cp
│       └── SAMPLE-03.rgi.txt
└── typing
    ├── SAMPLE-01
    │   ├── SAMPLE-01.mlst
    │   │   └── SAMPLE-01.mlst
    │   └── SAMPLE-01.recon
    │       ├── contig_report.txt
    │       └── mobtyper_aggregate_report.txt
    ├── SAMPLE-02
    │   ├── SAMPLE-02.mlst
    │   │   └── SAMPLE-02.mlst
    │   └── SAMPLE-02.recon
    │       ├── contig_report.txt
    │       └── mobtyper_aggregate_report.txt
    └── SAMPLE-03
        ├── SAMPLE-03.mlst
        │   └── SAMPLE-03.mlst
        └── SAMPLE-03.recon
            ├── contig_report.txt
            └── mobtyper_aggregate_report.txt
ddooley commented 5 years ago

I definitely think a sample 1st hierarchy (i.e. sample id at first branch in hierarchy) is best. This file structure allows for addition of new data products over time, and conceivably merging of other pipeline data products for a given sample).

Oh, wait, I’m on vacation! I shouldn’t be reading this!

d.

On Dec 12, 2018, at 12:07 PM, Dan Fornika notifications@github.com<mailto:notifications@github.com> wrote:

The way we currently do things is roughly like this:

. ├── assembly │ ├── SAMPLE-01 │ │ ├── contigs.fa │ │ ├── post-assembly_qc │ │ └── pre-assembly_qc │ ├── SAMPLE-02 │ │ ├── contigs.fa │ │ ├── post-assembly_qc │ │ └── pre-assembly_qc │ └── SAMPLE-03 │ ├── contigs.fa │ ├── post-assembly_qc │ └── pre-assembly_qc ├── resistance │ ├── SAMPLE-01 │ │ ├── SAMPLE-01.cp │ │ └── SAMPLE-01.rgi.txt │ ├── SAMPLE-02 │ │ ├── SAMPLE-02.cp │ │ └── SAMPLE-02.rgi.txt │ └── SAMPLE-03 │ ├── SAMPLE-03.cp │ └── SAMPLE-03.rgi.txt └── typing ├── SAMPLE-01 │ ├── SAMPLE-01.mlst │ │ └── SAMPLE-01.mlst │ └── SAMPLE-01.recon │ ├── contig_report.txt │ └── mobtyper_aggregate_report.txt ├── SAMPLE-02 │ ├── SAMPLE-02.mlst │ │ └── SAMPLE-02.mlst │ └── SAMPLE-02.recon │ ├── contig_report.txt │ └── mobtyper_aggregate_report.txt └── SAMPLE-03 ├── SAMPLE-03.mlst │ └── SAMPLE-03.mlst └── SAMPLE-03.recon ├── contig_report.txt └── mobtyper_aggregate_report.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Public-Health-Bioinformatics/cpo-pipeline/issues/9#issuecomment-446726247, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD0LRgESZUmRqSQk29myB1b7UbWp37k2ks5u4WIegaJpZM4Yeppe.