epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 3 forks source link

Middleware strategy #212

Open syansanofi opened 3 years ago

syansanofi commented 3 years ago

Issue:
Middleware performs some complicated file name parsing in order to avoid having all required columns passed in. For example sample_name is not directly in sc_rna_expression_cellranger_fastq.py. This adds an extra level of requirements that is not necessary.

Approach:
Include all the required columns for each type of sample manifest into middleware as CLI arguments. Users can specify these names freely.

kamyshova commented 3 years ago

@syansanofi Hi Shu, All existing middleware pipelines have -i <sample_name_list> option to specify sample names directly. Do I get it right, that we should also add sample_type and match_control options to dna_capture_var_fastq.pyand rna_expression_fastq.py ?