Open aswathysheena opened 2 years ago
Hi Aswathy,
in the BAM file, please check the reads that mapped to the genome but do not have GX/GN tags. One possibility is that these reads are spliced. For viral reads it probably makes sense to disallow splicing.
Also, your parameters outFilterMultimapNmax 500 winAnchorMultimapNmax 500
are missing -- .
Cheers Alex
Thanks Alex.
The total reads mapped (identified using samtools -f 4 -c ) is 496607, while greping empty GX/GN is giving 349504.
How can I disallow splicing while running STARsolo. ? is it --outSJfilterReads .?
Hi Aswathy
you can prohibit splicing with --alignIntronMax 1
.
However, I just realized that there is another reason why the difference you observe:
"Reads Mapped to Genome" counts all reads regardless their Cell Barcode or UMI, but "Reads Mapped to Gene" counts only reads that have valid CB/UMI, so it's always a smaller number.
Cheers Alex
Thank you Alex.
The CB/UB are error corrected. Sorry I read the README again.
But can you please tell me what does GX/GN as "-" means? I am getting a handful of same entries with mapped , valid barcode (from whitelist)?what does it indicates?
I did read the Multimapping BAM tag?[ #1280]
I am using version 2.7.10a_alpha_220506. can you help me to understand the "-" GX/GN tags in the bam represents multimapped reads?
Hi Aswathy
"-" in GX/GN means that either the read does not map to a gene, or it maps to multiple genes.
Hi ,
I have built a exclusive endogenous viral reference and generated STAR index(similar to custom reference in cell ranger, each fasta sequence has one entry/annotation in GTF ie; each fasta entry has annotated for each viral strain than gene) eg: one fasta entry corresponds to ERVK-10 in GTF
The idea is to get count matrix for reads mapping to any viral genome.
Then ran STARsolo as follows:
STAR --runThreadN 19 --soloType Droplet \ --soloCBwhitelist 737K-august-2016.txt --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM \ --outSAMtype BAM SortedByCoordinate --genomeDir reference/ --outFileNamePrefix {sample} --readFilesCommand zcat \ outFilterMultimapNmax 500 winAnchorMultimapNmax 500 --soloMultiMappers EM \ --readFilesIn {','.join(files2)} {','.join(files1)}
Considering the sequence homology, I have used --soloMultiMappers EM option to get Multi-mapped reads counted.
My query is reg. the metrics summary.csv.
Reads Mapped to Genome: Unique+Multiple,0.0122193 Reads Mapped to Genome: Unique,0.00873406 Reads Mapped to Gene: Unique+Multiple Gene,0.00548684 Reads Mapped to Gene: Unique Gene,0.00441223
In this scenario, what is it difference b/w "Mapped to Genome and Mapped to Gene"
Any details on this topic would helpful,
Thanks, Aswathy