Interpreting read counts

Hi,

I'm testing EMA on a set of 10x reads that are stored both in two separate fastQ files (R1 and R2). The number of reads in R1 is 9,000,743 and in R2 is, of course, the same, so in total we have 9,000,743 "read pairs" (or 18,001,486 total reads).

When I run count on the fastQ files, I get the following log:

$ parallel --will-cite -j 2 'paste <(gzip -cd {} | paste - - - -) <(gzip -cd {= s:_R1_:_R2_: =} | paste - - - -) | tr "\t" "\n" | ema count -w 4M-with-alts-february-2016.txt -o {/.}' ::: *_R1_*.gz >> count.log
:: Loading 10X took 2.0 s
:: Counting took 494.5 s
:: Reads with OK barcode: 8,953,372 out of 9,000,743
:: Ignored 0 reads
:: Dumped block 1
:: Printing took 0.5 s
:: Processed 9,000,743 reads (6,350 MB uncompressed) in 497 s

For what I understand, only R1 has the barcode at the start its sequence.

I think I'm just confused with the read counts logged by EMA. Would the above log message means that EMA correctly counted the ~9M barcodes from R1? or that it only processed ~9M of reads (half of the total read count)?

Many thanks, Pedro

arshajii / ema

Interpreting read counts #29