Closed marcjwilliams1 closed 5 years ago
Hi @marcjwilliams1: Where did this BAM file come from? Some archives 're-process' the BAM file and remove all but a few tags. This BAM appears to be missing the tags that Cell Ranger puts on to every read, which bamtofastq
needs to reconstruct the full original sequence. In this case bamtofastq
is looking for BC,QT,CR,CY,UR,UY,TR,TQ
.
You'll need to get a hold of original BAM file that Cell Ranger produced in order to run bamtofastq
OK, thanks. I got it from SRA, the fastq files also seem to have missing information, only one fastq file is dumped rather than two, was hoping the bam files would have all the information but doesn't look like it's the case.
@marcjwilliams1 can you share the SRA accession you're looking at? In theory SRA is not supposed to munge 10x BAM files for exactly this reason, but maybe this data isn't properly tagged as 10x? I'd like to look into this with SRA.
Sure, the accession for one of the bams is SRR7420402
(the whole project has GEO accession GSE116222).
I used the following command to download it:
sam-dump SRR7420402 | samtools view -bS - > SRR7420402
Also if I try to download the fastq files I only got 1 fastq files rather than the 2 that I would have expected.
fasterq-dump --split-files SRR7420402
Do let me know if you find anything out. Thanks.
I do have exactly the same issue. Only 1 fastq file which is not good to be used for Cellranger pipeline. Thus I also thought about using the bam file from the SRA to get the compatible fastq but got the same problem.
@marcjwilliams1 @qingnanl I was poking around on SRA trying to understand what happened. I found the 'Original format' section in the Run Browser. In that section there's a BAM file link which appears to be the original BAM Cell Ranger library and works with bamtofastq
-- so I think that's your path forward.
I am sorry for 'necrobumping' this post but I am trapped by GSE116222 too. Before try bam file I would like to know if it works. At first I too thought the fastq file is concantenated or interleaved, having some name issue. But it seems to have both side UMI & barcode trimmed to save the overlapped sequence only, if I havent get it wrong?
I get the following error when trying to run bamtofastq
thread 'main' panicked at 'Invalid BAM record: read: "1" is missing tag: "CR"', src/main.rs:509:25
Here are the comment tags in the bam header:
and the first read in the bamfile
Wondering if something is formatted incorrectly. Any help much appreciated, thanks!