Closed seb-mueller closed 5 years ago
Just realized that my whole assumption was based on the Hamming distance of barcodes, which I subconsciously assumed, however clearly UMI-edit-distance
refers to UMIs obviously. My bad, sorry for the confustion.
However, having done all this test, it's still strange that the NUM_GENES
didn't change wheres NUM_TRANSCRIPT
more than halved going from 0 to 3 (see above).
I'll still this issue open for a little just in case someone has an explanation.
Hi Patrick,
Trying to play around the impact of varying the Edit (Hamming) distance I've run the same data set with different
UMI-edit-distance
settings in theconfig.yaml
. Not sure what to expect really I was still surprised that non of the cell-barcode counts have changed in the*.dge.summary.txt
files. Since the Edit distance is passed on to drop-seq tools and to exclude any problems withdropseqPipe
, I've tried to runDigitalExpression
from version1.13
directly with varying edit distances as follows (using the macosko dataset, but results were same for other sets as well):The results are showing the first barcodes for brevity:
mac_1000_SRR1748411_dge.summary.txt.edit3
mac_1000_SRR1748411_dge.summary.txt.edit0
As of the excerpts above, an edit distance of 0 and 3 gave the same counts per barcode (in fact for the complete list also for distance 1 and 2, which I could send). I find that a rather unlikely result and was wondering if you had similar experiences, maybe something is wrong with the drop-seq-tools or I'm missing something obvious?
Also, the
NUM_GENIC_READS
andNUM_GENES
don't change, however theNUM_TRANSCRIPT
does which is odd.Also note, the knee_plots (which I thought to have changed in the first place) are based on
logs/{sample}_hist_out_cell.txt
which I thought should also be affected by the Edit-distance, but they don't seem to have it as parameter (only the dge.summary.txt generation rule has it as input parameter):https://github.com/Hoohm/dropSeqPipe/blob/b91fdce7f84ff941f5894c611d7e32a9772f0b70/rules/map.smk#L135