Closed BW15061999 closed 2 years ago
Hi @BW15061999,
I’m not aware of any tagged-end single-cell protocol that uses only 1 read. The most common data types place the UMI and Barcode on one of the reads, while the other “biological” reads are drawn from the transcriptome. This is the case with the Chromimum protocol. The reason you are seeing 0 assigned reads is that no barcodes can be extracted, because the second read is missing. Therefore, no reads can be assigned to any cell. What specific protocol are you using? Do you not have the full read pairs for each sample? Cc @k3yavi as the resident protocol guru.
Best, Rob
Hi @rob-p ,
The data downloaded from sra database and use fastq-dump
to split it only generate one fastq file, and EBI database only show one fastq file per sample. I am not sure if I process the file correctly
And here is a part of the description of the file on the sra database, and the link of one of the file
SRR8453531
Instrument: Illumina HiSeq 3000
Strategy: RNA-Seq
Source: TRANSCRIPTOMIC
Selection: cDNA
Layout: SINGLE
Construction protocol: The scRNA-seq libraries were generated using Chromium Single Cell 3' Library & Gel Bead Kit v2 (10X Genomic) according to manufacturer's protocol. Briefly, 10,000-15,000 live cells were FACS-sorted and used to generate single-cell gel-bead in emulsion (GEM). After reverse transcription, GEMs were disrupted. Barcoded cDNA was isolated and amplified by PCR (12 cycles). Following fragmentation, end repair, and A-tailing, sample indexes were added during index PCR (8 cycles). Indexed libraries were multiplexed and sequenced on Illumina HiSeq 3000 instruments according to the manufacturer's instructions (26 cycles of Read 1, 8 cycles of i7 Index, and 98 cycles of Read2).
Best
Hi @BW15061999 , Yes, this is a known problem for single-cell data uploaded on NCBI. The idea is to download the BAM files of the data (yours should be here under data access section) and then use tools like these to generate paired-end FASTQ files from the BAM file before running alevin. The one downloaded directly from NCBI/EBI doesn't has the CB/UMI components of the paired-reads.
Hope it helps !
@k3yavi beat me to it! It is, unfortunately, a recurring problem. The SRA file itself only contains one of the reads and is therefore essentially useless in analyzing the single-cell data. This is an ongoing problem that I've mentioned several times, but I don't know if the SRA has a plan in place to address it. The proper solution at this point is exactly as Avi suggests; download the bam file (what the SRA calls the original TenX format data), and run it through 10x's bamtofastq to get back the original fastq files (this time paired-end) that you can process. Let us know if you have success with this.
Best, Rob
Hi, I run the command
salmon alevin -i index -p 4 -l SR --chromium --sketch -r 1.fastq.gz -o ./output
with single-end data as input . Although it didn't generate error, it didn't map anything. Can I use two single-end data from different samples as pair-end data to runsalmon alevin
Thank you !
and here is a part of the output