10XGenomics / bamtofastq

Convert 10x BAM files to the original FASTQs compatible with 10x pipelines
MIT License
59 stars 6 forks source link

Running bamtofastq on a subset of CellRanger bam #27

Closed jmintch closed 3 months ago

jmintch commented 3 years ago

I would like to run bamtofastq on a subset of the CellRanger bam.

When I am running bamtofastq on the actual CellRanger possorted_genome_bam.bam, it works very smoothly, but when I input the bam subset, I am running into the error message: Error opening BAM file.

I would appreciate any help!

Below are the commands I ran to obtain the bam file subset:

  1. Get readIDs of interest from a fastq (created downstream of the CellRanger output: is a subset of the initial bam with some additional tags, however no modification in the readID)

awk 'NR % 4 == 1' possorted_genome_bam_modified_R1.fastq | cut -c 2- > readID_labelled.txt

  1. subset CellRanger bam by readIDs of interest

samtools view possorted_genome_bam.bam | fgrep -w -f readID_labelled.txt > possorted_genome_bam_labelled.bam

bam does not contain a header, so I add it manually

samtools view -H possorted_genome_bam.bam > bam_subset/header.txt

cat header.txt possorted_genome_bam_labelled.bam > possorted_genome_bam_labelled_h.bam

manual inspection confirms same layout of CellRanger bam and subsetted bam command from bamtofastq:

/local/users/bin/bamtofastq-1.3.2 possorted_genome_bam_labelled_h.bam path/to/file/fastq

evolvedmicrobe commented 3 months ago

You appear to be outputting a SAM file instead of a BAM file (samtools view produces text). You can try converting this to BAM before attempting to use the tool. Please reach out to support@10xgenomics.com if you need further help.