Open fabotao opened 1 year ago
Hi @fabotao
the matrix.mtx is a sparse matrix and may not contain all junctions in the features.tsv, specifically the junctions detected for reads without correct barcodes/umi, and junctions detected only with multimapping reads.
Thanks for your reply. It seems that the sparse matrix lose some rows. Do you have any suggestions on how to read this sparse matrix in combination with features.tsv, barcodes.tsv? Thanks a lot!
You can use standard tools, but you may need to modify the features.tsv file for splice junctions, combine the first 3 columns (chr, start, end) together separating them by underscores, to create unique splice junction ids.
We process data GSE115469 using STAR with the following command. However, the Solo.out.1/SJ/raw/matrix.mtx has less rows (features) compared with the features.tsv, thus we cannot annotate the SJ matrix by hand as well as by MARVEL program.
STAR --runThreadN 16 \ --genomeDir refdata-cellranger-GRCh38-3.0.0 \ --soloType CB_UMI_Simple \ --readFilesIn SRR9008752_possorted_genome_bam.bam \ --readFilesCommand samtools view -F 0x100 \ --readFilesType SAM SE \ --soloInputSAMattrBarcodeSeq CR UR \ --soloInputSAMattrBarcodeQual CY UY \ --soloCBwhitelist 737K-august-2016.txt \ --soloFeatures Gene SJ
The dimension of the Solo.out.1/SJ/raw/matrix.mtx is 159339 x 737280 (159339 features) whereas the dimension of the features.tsv is 188538 x 9 with 188538 features
Looking forward to your reply!