Closed jscaber closed 5 years ago
This obviously does not affect featurecounts counts, as these are done on the premerged bam files.
There were two elements to this bug. First one was probably in rnaseq.py: sending tuple of where a single element would be expected (this meant the preprocessing did not cycle through the options). Second one was in pipeline_rnaseqdiffexpression.py: sending only a first element of the list so it doesn't break the pipeline (and silently discarding lanes, replicates).
This affected Salmon, Sailfish and Kallisto
Good spot @jscaber. Have you ran pipeline testing yet and do all pipelines pass?
Looking at the testing code, I would have thought that pipeline rnaseqdiffexpression would need testing to be updated?
Testing is on my todo list. The testing pipeline will have to be updated.
As I have removed a tuple that is normally traversed by a "for x in y" statement, I assume I will have to do something like
if outfile not list: outfile = (outfile,)
to get the code to work for samples where no merging is needed.
We've seen this bug before. I thought I had submitted an issue, maybe about a year ago, perhaps longer.?
Dear all,
If anyone is using the rnaseqdiffexpression pipeline for generating salmon counts that can be imported using tximport: There is a huge merge bug.
To force the class of SalmonQuantifier into the mapper class, the mapper would take the list of fastqfiles to be merged and only take the first file, silently discarding the rest. It will then appear to make a merged abundance.h5 and quant.sf file, however this will only be the data from the first lane/replicate.
Best wishes, Jakub