Closed eltonjrv closed 2 years ago
CR is the "raw" cell-barcode, CB is the "cleaned-up" cell-barcode in Cell-Ranger. See: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/bam
The Z just indicates a string value.
If you really want the CR tag, you can define a new CellRanger cell-barcode strategy:
[CellRangerCR_CB] Name: CellRanger-CR Description: Cell barcodes from the CR tag of aligned read - reads without a CR tag or with CR tag not in the accept list (default: file "barcodes.tsv" in the current directory) dropped. Type: CellBarcode ReadTagValue: tag='CR' acceptlist='barcodes.tsv'
in the file groups.ini in the current working directory.
However, all that said, it looks like you do not have a valid BAM file - the pysam module can't open it. Can you point me to where you got it from and how you prepared it for SCExecute?
Cheers!
Here is the documentation for how scExecute handles barcodes...
https://horvathlab.github.io/NGS/SCExecute/docs/Barcodes.html
I was able to find the BAM file online based on the filename you included in your issue. This BAM file has been processed by the tool fastq_pre_barcodes from the fastq_utils suite according to the BAM file header. I was also able to reproduce your error, so I should be able to offer more concrete advice shortly.
Hi Edward, Thanks for your prompt support. Yes, it was indeed generated by fastq_pre_barcodes as we can see through "samtools view -H". Apologies for my naiveness as I've just started handling public scRNA-seq datasets. My previous work on scRNA-seq was only using FlyCellAtlas h5ad data, which can be easily loaded into Seurat for downstream analyses starting from the readCounts matrix. Well, I'll be avidly waiting for your next advice for this particular bam type. Thanks again, Best, Elton
OK, what I've determined is that this BAM file is alignment free, it is just a convenient way to store the reads (and the barcodes). I can tweak the way that scExecute works to avoid the need for BAM files with alignments in them, but realistically, the tasks you want to run on the single-cell partitioned BAMs will probably require alignments. In short, get these reads (whether from the BAM or fastq files) aligned to a reference genome before using scExecute.
Hey Nathan, Thanks a lot for clarifying. I've converted those bam to fastq (maintaining the barcode info), and then successfully ran STARsolo. Cheers, Elton
I'm going to close this issue, which was really an issue with the BAM file. Nevertheless, I will make two changes to scExecute - first, I will tweak the code to permit unaligned BAM files as input, second, I will add a rule for fastq_pre_barcodes cell-barcode tags (CR) to the distribution.
Permitting unaligned BAM files is, most likely, not that useful, since the downstream analyses likely need aligned reads. But scExecute doesn't need them to be aligned itself, so it should not impose this restriction.
If you think there is more to discuss on this issue, please re-open it.
Cheers!
Dear SCExecute team,
Thanks for developing such tool, which appears to be quite useful for my current purpose (assessing several public scRNA-seq data). However, I'm encountering the following error message. ############################################### [193647] Failed to execute script scExecute Traceback (most recent call last): File "scExecute.py", line 514, in
File "split.py", line 37, in iterator
File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 991, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
###############################################
I've also noticed that the barcode tag in the bam file is actually CR:Z rather than CB, as the program seems to expect: scExecute Options: Read Files (-r): T06_TH_TOT_5GEX_1_S9.bam Read Groups (-G): CellRanger Description: Cell barcodes from the CB tag of aligned read - reads without a CB tag or with CB tag not in the accept list (default: file "barcodes.tsv" in the current directory) dropped. Specification: tag=CB acceptlist=barcodes.tsv
I hope you guys can shed a light to circumvent this. Thanks, Best, Elton