LiuzLab / AI_MARRVEL

AI-MARRVEL (AIM) is an AI system for rare genetic disorder diagnosis
GNU General Public License v3.0
8 stars 6 forks source link

Organize the use of out_dir #76

Closed jylee-bcm closed 2 months ago

jylee-bcm commented 2 months ago

Motivation

We recently started using the Nextflow storeDir directive, which allows us to cache certain outputs that are independent of the user's input. This enables some output files to be shared across multiple runs, avoiding unnecessary regeneration.

Currently, if we use the same output directory for multiple runs, the content gets overwritten, preventing shared outputs from being reused effectively. To address this, this change introduces a separate subdirectory for each run, identified by a run_id, within the output directory. This allows us to store shared outputs that are generated only once and reused across multiple runs, without risk of overwriting. This PR resolves the Issue #77

Previous Directory Structure

out/
├── merged
├── phrank
├── prediction
├── reference_index
└── vep

In this structure, all runs share the same output directory, regardless of the run_id.

New Directory Structure

out/
├── 1
│   ├── merged
│   ├── phrank
│   ├── prediction
│   ├── vcf
│   └── vep
└── general
    └── reference_index

With this change, outputs are organized by run_id, while the shared outputs are stored separately in the general directory.

Impact

This change introduces a breaking change for downstream workflows, such as those involving the web interface. I would appreciate your feedback on this approach!

@hyunhwan-bcm @ZaahidShaik @arine