YosefLab / scone

53 stars 12 forks source link

Cell Ranger alignment stats #104

Open iwillham opened 4 years ago

iwillham commented 4 years ago

Hi there, I have some 10X data that I'd like to try scone on. I'm trying to find alignment QC metrics in the Cell Ranger output files. Does Cell Ranger output the alignment QC metrics you reported in Table S2 of the paper (i.e., unmapped_reads, umi_corrected, etc.) ?

Thanks, ian

drisso commented 4 years ago

Hi @iwilliams91 ,

I only have a vague recollection of what we did, but if I remember correctly we had to extract the metrics from the cell ranger output in a non-obvious location.

@mbcole performed the analysis and might remember more?

asmariyaz23 commented 4 years ago

@iwillham were you able to find the answer to your question? I am stuck at the same.

coltonrobbins73 commented 3 years ago

@iwillham and @asmariyaz23 Not sure if you are still looking for a solution here, but I've made a little progress with this. So far I've been able to find 1) unmapped_reads 2) num_reads

You can get the complete list of mapped reads from your .bam file using samtools. (Note, I'm using Unix commands to find these barcodes. I think you can find equivalent commands for mac or PC.)

samtools view possorted_genome_bam.bam | awk ' match($0,/CB:Z:[ACGT]*/) {
a[substr($0,RSTART+5,RLENGTH-5)]++
} END { for(i in a) print i,a[i]
}' >> /mapped_reads_per_barcode

output of the first 10 lines GAAACTCTCGCAAACT | 14 ACATACGTCTCATTCA | 7 GATCGCGAGAACAATC | 4 CACACTCAGAAGGTGA | 18 TGCACCTAGTCCGGTC | 22889 GGACATTAGGATGTAT | 9 GACCAATCACATTCGA | 1 GAACCTATCAGAAATG | 6 AGCTCTCGTACACCGC | 13 CACAGTAAGCGCCTCA | 1043

You can then subset this barcode count list with the verified barcodes from cell ranger

For unmapped reads, replace the first command line with: 'samtools view -f 4 possorted_genome_bam.bam'

num_reads would then just be the two tables trimmed, ordered, and summed.

@mbcole Does that sound right?