alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Velocyto mapping summary indicating no unique reads #2148

Open AnnaMaguza opened 6 months ago

AnnaMaguza commented 6 months ago

Hi!

Thank you for creating such a great tool!

I have been mapping my single-cell GEX data generated with 10X V2 using STAR version 2.7.11.a. My goal is to obtain spliced and unspliced count matrices for RNA velocity analysis. The mapping process completed without any errors, and I successfully generated all necessary files, including the ambiguous, spliced, and unspliced matrices. I verified that these matrices are not empty. After creating anndata objects with these matrices, I checked the proportions using scv.pl.proportions(adata), and they appear valid: 40% spliced, 51% unspliced, and 8% ambiguous.

However, the Summary.csv file from Velocyto indicates:

Reads Mapped to Velocyto: Unique+Multiple Velocyto,NoMulti
Reads Mapped to Velocyto: Unique Velocyto,0

Does this mean that the mapping didn't work properly? Can I use the matrices I generated?

Here are the parameters I used:

STAR \
    --runThreadN 56 \
    --genomeDir "$INDEX_FILE_DIR" \
    --readFilesIn "$FILE"_S1_L001_R2_001.fastq.gz "$FILE"_S1_L001_R1_001.fastq.gz" \
    --runDirPerm All_RWX \
    --soloCBwhitelist "$WHITE_LIST_DIR/737K-august-2016.txt" \
    --soloFeatures Gene GeneFull Velocyto \
    --readFilesCommand zcat \
    --soloOutFileNames "$SRA_OUTPUT_DIR"/ features.tsv barcodes.tsv matrix.mtx \
    --soloType CB_UMI_Simple \
    --soloCBstart 1 \
    --soloCBlen 16 \
    --soloUMIstart 17 \
    --soloUMIlen 10 \
    --soloStrand Forward \
    --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
    --soloUMIfiltering MultiGeneUMI_CR \
    --soloUMIdedup 1MM_CR \
    --outFilterScoreMin 30 \
    --outSAMtype BAM SortedByCoordinate \
    --clip5pAdapterSeq - - \
    --clip5pAdapterMMp 0.1 0.1 \
    --soloBarcodeReadLength 101 \
    --outFileNamePrefix "$SRA_OUTPUT_DIR/"

And here is the full output in the Velocyto Summary.csv:

Number of Reads,294990278
Reads With Valid Barcodes,0.933189
Sequencing Saturation,-inf
Q30 Bases in CB+UMI,0.976501
Q30 Bases in RNA read,0.894207
Reads Mapped to Genome: Unique+Multiple,0.808504
Reads Mapped to Genome: Unique,0.620197
Reads Mapped to Velocyto: Unique+Multiple Velocyto,NoMulti
Reads Mapped to Velocyto: Unique Velocyto,0

Thank you in advance!

Anna Maguza

JDOUBLE-U commented 4 months ago

Same issue! Highly appreciate some help 😊

Summar.csv

Reads Mapped to Velocyto: Unique+Multiple Velocyto,0
Reads Mapped to Velocyto: Unique Velocyto,0

And in all .mtx files similar results (ambiguous.mtx spliced.mtx unspliced.mtx)

%%MatrixMarket matrix coordinate integer general
%
60685 384 0
JDOUBLE-U commented 4 months ago

Did you already find a fix for this @AnnaMaguza ?