Psy-Fer / buttery-eel

The buttery eel - a slow5 guppy/dorado basecaller wrapper
MIT License
34 stars 2 forks source link

demultiplexing remora SAMs #14

Closed hasindu2008 closed 3 weeks ago

hasindu2008 commented 1 year ago

Please add the following instructions to somewhere:

For FASTQ output from buttery-eel, we call guppy_barcoder which comes with the ONT Guppy package for demuxing. It seems that guppy_barcoder does not take uSAM as input. However, the following approach that converts uSAM to FASTQ (keeps the methylation information as name tags) can be used for demuxing.

#convert
samtools fastq -TMM,ML romara.mod.sam > remora.mods.fastq
#demux
guppy_barcoder <kit options> -i /dir/containing/remora.mods.fastq  -s demuxed_out/ -x cuda:all
#Then you can use minimap2 with -y option to align these FASTQs
minimap2 -ax map-ont -y ref.fa demuxed_out/barcodex.mods.fastq | samtools sort - > barcodex.mods.bam

Note that if your remora.mods.fastq file is pretty large and your RAM is less than the size of fastq, Guppy_barcoder will run out of memory as it seem to load the whole fastq file to memory. To avoid this issue,we can split the big fastq file into smaller files as below:

#split the large fastq to smaller fastq containing 4000 reads in each
mkdir split_fastq/
split -l 16000 remora.mods.fastq --additional-suffix=.fastq split_fastq/
#call barcoder on that split fastq dir so it does not run out of RAM
guppy_barcoder <kit options> -i split_fastq/-s demuxed_out/ -x cuda:all

Some versions of Guppy barcoder seem to incorrectly use a space instead of a tab for separating runid and barcode tags, causing issues in downstream processing. So please fix your barcoded FASTQs as below, before using with tools such as Minimap2.

cat barcode04.test.fastq | sed 's/ runid/\trunid/g' | sed 's/ barcode/\tbarcode/g' > barcode04_fixed.test.fastq
minimap2 -ax map-ont /mnt/d/genome/hg38noAlt/hg38noAlt.idx barcode04_fixed.test.fastq -y | samtools sort - > barcode04.bam
Psy-Fer commented 1 year ago

The latest verison of guppy_barcoder in the dorado_server release apparently fixes the memory leak issue, and may have some other features. we should check that out before I do something in buttery-eel

Psy-Fer commented 1 year ago

Yea latest ont_barcoder takes bam as an input

hasindu2008 commented 1 year ago

that os good!