cgat-developers / cgat-flow

cgat-flow repository
MIT License
13 stars 9 forks source link

Regex for paired sequencing in pipeline_readqc #75

Open IanSudbery opened 5 years ago

IanSudbery commented 5 years ago

For paired reads in pipeline_readQC, the input regex is:

r"(?P<track>[^/]+).(?P<suffix>fastq.1.gz|fastq.gz|sra|csfasta.gz|remote)"```

The output of the fastqc task is recorded as

"fastqc.dir/{track[0}.fastqc"



This would mean that `Tissue-condition-replicate.fastq.1.gz` and `Tissue-condition-replicate.fastq.2.gz` but mapped onto `Tissue-condition-replicate.fastqc`. But in actaul fact the fastqc processor outputs `Tissue-condition_fastq_1.fastqc` and `Tissue-conditions-replicate_fastq_2.fastqc`.

This is mostly harmless other than the fastqc will always be rerun. However it does mean that if `autoremove=1`, the pipeline won't run, as it is looking for `Tissue-conditions-replicate.fastqc`.