10XGenomics / bamtofastq

Convert 10x BAM files to the original FASTQs compatible with 10x pipelines
MIT License
59 stars 6 forks source link

'Invalid BAM record: read: "1" is missing tag: "CR"' #12

Closed marcjwilliams1 closed 5 years ago

marcjwilliams1 commented 5 years ago

I get the following error when trying to run bamtofastq thread 'main' panicked at 'Invalid BAM record: read: "1" is missing tag: "CR"', src/main.rs:509:25

Here are the comment tags in the bam header:

@CO 10x_bam_to_fastq:I1(BC:QT)
@CO 10x_bam_to_fastq:R1(CR:CY,UR:UY,TR:TQ)
@CO 10x_bam_to_fastq:R2(SEQ:QUAL)

and the first read in the bamfile

52080637    256 chr10   13047   3   25S125M *   0   0   GTGGTATCAACGCAGAGTACATGGGGGCTCCAACCCTCGGGATGCCTCATGCTCACCCTTTGGCACCCACCTGACAGCTCAGCATGTCTGCTCTCTGCCATCCTCAATGCCTGCTCTAGACAAGCCCAAGTCCGCCAGGAGTGGCAGAGG  FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FF:FFFFFFFFFFFFF:FFFFFFFFFF:FFFFF:FFFF  RG:Z:A3_inflamed:MissingLibrary:1:H3TWHDMXX:1   NH:i:2  NM:i:2

Wondering if something is formatted incorrectly. Any help much appreciated, thanks!

pmarks commented 5 years ago

Hi @marcjwilliams1: Where did this BAM file come from? Some archives 're-process' the BAM file and remove all but a few tags. This BAM appears to be missing the tags that Cell Ranger puts on to every read, which bamtofastq needs to reconstruct the full original sequence. In this case bamtofastq is looking for BC,QT,CR,CY,UR,UY,TR,TQ.

You'll need to get a hold of original BAM file that Cell Ranger produced in order to run bamtofastq

marcjwilliams1 commented 5 years ago

OK, thanks. I got it from SRA, the fastq files also seem to have missing information, only one fastq file is dumped rather than two, was hoping the bam files would have all the information but doesn't look like it's the case.

pmarks commented 5 years ago

@marcjwilliams1 can you share the SRA accession you're looking at? In theory SRA is not supposed to munge 10x BAM files for exactly this reason, but maybe this data isn't properly tagged as 10x? I'd like to look into this with SRA.

marcjwilliams1 commented 5 years ago

Sure, the accession for one of the bams is SRR7420402 (the whole project has GEO accession GSE116222).

I used the following command to download it: sam-dump SRR7420402 | samtools view -bS - > SRR7420402

Also if I try to download the fastq files I only got 1 fastq files rather than the 2 that I would have expected. fasterq-dump --split-files SRR7420402

Do let me know if you find anything out. Thanks.

qingnanl commented 5 years ago

I do have exactly the same issue. Only 1 fastq file which is not good to be used for Cellranger pipeline. Thus I also thought about using the bam file from the SRA to get the compatible fastq but got the same problem.

pmarks commented 5 years ago

@marcjwilliams1 @qingnanl I was poking around on SRA trying to understand what happened. I found the 'Original format' section in the Run Browser. In that section there's a BAM file link which appears to be the original BAM Cell Ranger library and works with bamtofastq -- so I think that's your path forward.

KforKuma commented 1 year ago

I am sorry for 'necrobumping' this post but I am trapped by GSE116222 too. Before try bam file I would like to know if it works. At first I too thought the fastq file is concantenated or interleaved, having some name issue. But it seems to have both side UMI & barcode trimmed to save the overlapped sequence only, if I havent get it wrong?