Feature Proposal: a Merger Tool

Background: The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on SampleSheet.csv files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.

Feature Description: Introduce a merger tool that takes a .csv mapping file and generates a merged SampleSheet.csv, RunInfo.xml, and a duplicate of the input .csv for traceability. The mapping file correlates sample_name and run_folder with output_name, specifying the merging plan.

Feature Objectives:

Facilitate efficient sample mergers across different run folders.
Ensure consistency and traceability for merged samples.
Handle default values and conflicts in input .csv files.

Functional Requirements:

Input to the tool:
- Path to the mapping .csv file.
- Path to the output folder.
Outputs of the tool:
- SampleSheet.csv with merged output_name records.
- RunInfo.xml copied from the first associated run_folder.
- Input .csv file to trace origins of merged data.
Conflict resolution strategy, with a strict mode option (--strict flag).

Conflict Resolution Rules:

project_name header field to follow the $current_date.merged pattern.
date header field to reflect the actual merge date.
All other fields should use the first observed value unless --strict is enabled.
Fields index and index2 should default to XXXXX.

Implementation Tasks:

[X] Develop a merging script for the underlying sample files.
[ ] Develop logic to parse the input .csv and handle row defaults.
[ ] Implement conflict detection logic with stdout reporting.
[ ] Create file generation procedures for SampleSheet.csv and RunInfo.xml.
[ ] Build merging algorithm to create a consolidated .csv from the mapping file.
[ ] Add a --non-strict mode for conflict resolution, with it becoming the default.
[ ] Write unit tests to validate merging logic and conflict handling.
[ ] Add documentation for the merger tool usage and features.

cfe-lab / MiCall

Feature Proposal: a Merger Tool #1036