cgat-developers / cgat-flow

cgat-flow repository
MIT License
13 stars 9 forks source link

Quasimapper Merging Bug - ATTENTION #62

Closed jscaber closed 5 years ago

jscaber commented 5 years ago

Dear all,

If anyone is using the rnaseqdiffexpression pipeline for generating salmon counts that can be imported using tximport: There is a huge merge bug.

To force the class of SalmonQuantifier into the mapper class, the mapper would take the list of fastqfiles to be merged and only take the first file, silently discarding the rest. It will then appear to make a merged abundance.h5 and quant.sf file, however this will only be the data from the first lane/replicate.

Best wishes, Jakub

jscaber commented 5 years ago

This obviously does not affect featurecounts counts, as these are done on the premerged bam files.

jscaber commented 5 years ago

There were two elements to this bug. First one was probably in rnaseq.py: sending tuple of where a single element would be expected (this meant the preprocessing did not cycle through the options). Second one was in pipeline_rnaseqdiffexpression.py: sending only a first element of the list so it doesn't break the pipeline (and silently discarding lanes, replicates).

jscaber commented 5 years ago

This affected Salmon, Sailfish and Kallisto

Acribbs commented 5 years ago

Good spot @jscaber. Have you ran pipeline testing yet and do all pipelines pass?

Acribbs commented 5 years ago

Looking at the testing code, I would have thought that pipeline rnaseqdiffexpression would need testing to be updated?

jscaber commented 5 years ago

Testing is on my todo list. The testing pipeline will have to be updated.

As I have removed a tuple that is normally traversed by a "for x in y" statement, I assume I will have to do something like if outfile not list: outfile = (outfile,) to get the code to work for samples where no merging is needed.

IanSudbery commented 5 years ago

We've seen this bug before. I thought I had submitted an issue, maybe about a year ago, perhaps longer.?