alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Transcriptome aligned BAM is empty when using setting for 10X 5prime #2024

Open edg1983 opened 11 months ago

edg1983 commented 11 months ago

Hi,

I'm using STARsolo v2.7.10b to align reads generated by 3prime or 5prime 10X libraries, and I want to output the BAM file aligned to transcriptome, so I'm using the --quantMode TranscriptomeSAM option.

For 3prime libraries, all worked perfectly with the following command line.

STAR --runThreadN 12 \
    --soloType CB_UMI_Simple \
    --soloUMIlen 12 \
    --soloCellFilter EmptyDrops_CR \
    --soloFeatures GeneFull SJ \
    --genomeDir ${genome_ref} \
    --soloCBwhitelist ${white_list} \
    --outFileNamePrefix ${out_prefix}. \
    --readFilesCommand zcat \
    --outSAMtype BAM SortedByCoordinate \
    --quantMode TranscriptomeSAM \
    --clipAdapterType CellRanger4 \
    --outSAMattributes NH HI AS GX GN CB UB \
    --soloMultiMappers EM \
    --readFilesIn ${R2_fastqs} ${R1_fastqs}

However, when processing 5prime libraries, the .toTranscriptome.out.bam generated only contains a header with SQ definitions but nothing else (and no reads).

I'm using the following command to process 5prime data, following the suggestions in the documentation.

STAR --runThreadN 12 \
    --soloType CB_UMI_Simple \
    --soloCellFilter EmptyDrops_CR \
    --soloFeatures GeneFull SJ \
    --genomeDir ${genome_ref} \
    --soloCBwhitelist ${white_list} \
    --outFileNamePrefix ${out_prefix}. \
    --readFilesCommand zcat \
    --outSAMtype BAM SortedByCoordinate \
    --quantMode TranscriptomeSAM \
    --outSAMattributes NH HI AS GX GN CB UB \
    --soloMultiMappers EM \
    --readFilesIn ${R1_fastqs} ${R2_fastqs} \
    --soloBarcodeMate 1 \
    --soloStrand Forward \
    --clip5pNbases 39 0 \
    --soloCBstart 1   --soloCBlen 16   --soloUMIstart 17   --soloUMIlen 10

Is there something missing in my command to be able to generate the transcriptome-aligned BAM correctly?

Thanks!

alexdobin commented 11 months ago

Hi Edoardo,

Your parameters look fine, so this is potentially a bug.

edg1983 commented 9 months ago

I'd like to add here another weird behavior happening when I use the above command on 5prime data.

In the resulting sorted BAM file (5prime.Aligned.sortedByCoord.out.bam), the bitwise FLAG for the reads is set to include read-paired + mate-unmapped (values like 137, 153, 393, 409). This creates issues downstream since many tools processing the BAM file see this FLAG and skip the reads for not having a proper pair or refuse to process the file since the pair reads are not found.

Given this is single-cell data, reads are not expected to be paired, so these FLAGs are incorrectly set here in my opinion.

Indeed, when running on 3prime data with the command above, the generated BAM file contains the expected FLAG, not assuming paired reads (values like 16, 256, 272).

alexdobin commented 9 months ago

--soloBarcodeMate 1 expects both mates to have cDNA sequence, which is typical for 5' libraries. If your library does not have cDNA sequence on the barcode read, you can run it the standard 3' way, possibly with --soloStrand Reverse.