cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Feature Proposal: a Merger Tool #1036

Open Donaim opened 6 months ago

Donaim commented 6 months ago

Background: The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on SampleSheet.csv files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.

Feature Description: Introduce a merger tool that takes a .csv mapping file and generates a merged SampleSheet.csv, RunInfo.xml, and a duplicate of the input .csv for traceability. The mapping file correlates sample_name and run_folder with output_name, specifying the merging plan.

Feature Objectives:

  1. Facilitate efficient sample mergers across different run folders.
  2. Ensure consistency and traceability for merged samples.
  3. Handle default values and conflicts in input .csv files.

Functional Requirements:

Conflict Resolution Rules:

Implementation Tasks:

Donaim commented 6 months ago

Merging script implemented in https://github.com/cfe-lab/MiCall/pull/1026