Plot Sequencing Saturation

WenyuLiang commented 11 months ago

Hi Alex,

Thanks for the great tool! I'd like to plot sequencing saturation like cellranger. I wonder if I can get the accumulating saturation from the bam file. according to this: https://github.com/alexdobin/STAR/issues/1887#issuecomment-1611354693 I wrote the following code. I use sS instead of CB UB and check sM tag like this because they are missing from bam file which maps to transcriptome. Can you see any bugs other than using sS instead of CB, UB, since the result is quite different from the correct one.

Best, Wenyu

while (sam_read1(in, header, record) >= 0) { uint8_t sS = bam_aux_get(record, "sS"); uint8_t gn = bam_aux_get(record, "GN"); uint8_t sm = bam_aux_get(record, "sM"); if (sm) { int sM = bam_aux2i(sm); if (sM < 0 || sM > 1) continue; } if (gn && *gn == 'Z') gene_name = bam_aux2Z(gn); else continue; if (gene_name == "-") continue; bool is_uniq = true; gene_umi_cb = gene_name + std::string(gene_umi_cb); auto result = uSeqs.insert(gene_umi_cb); is_uniq = result.second ? true : false; is_uniq_vec.emplace_back(is_uniq); }

alexdobin commented 11 months ago

Hi Wenya,

I cannot check your code. The saturation curves require downsampling of the reads.

WenyuLiang commented 11 months ago

Thanks for your response. I'm wondering if yessubWLmatch_UniqueFeature corresponds to reads with valid CB, UB, gX in the bam file?

alexdobin commented 11 months ago

Yes, that's correct (CB/UB/GX).

WenyuLiang commented 11 months ago

That makes sense. However, I check the total number of reads whose CB!="-" && UB!="-" && gX!="-" from Aligned.sortedByCoord.out.bam and it's different from yessubWLmatch_UniqueFeature in Gene/Features.stats

huichanghuang commented 4 months ago

That makes sense. However, I check the total number of reads whose CB!="-" && UB!="-" && gX!="-" from Aligned.sortedByCoord.out.bam and it's different from yessubWLmatch_UniqueFeature in Gene/Features.stats

Hi，Dear WenyuLiang, I have the same confusion when plot saturation. Do you find the good answer for the question? thank you very much.

alexdobin / STAR

Plot Sequencing Saturation #1969