GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
171 stars 22 forks source link

Barcode demultiplexing on the long-reads alignment? #417

Open ShaowenJ opened 3 months ago

ShaowenJ commented 3 months ago

Hi, I was wondering if I could use this method to help me quantify my Nanopore library with barcode sequence? I have already demultiplexed the ONT library to each individual barcode by using some other tools and generated a meta table with matched barcode and readID. It would be great if you guys have any ideas on how can I generate a barcode-gene count matrix from it. My current workflow is aligning via minimap2 and subset the bam file to each barcode by matching the readID, and use Salmon or other tools to quantify the counts, and compile the matrix together. But it took a very long time and memory. Maybe there's another more efficient way.

Thanks! Shaowen

andredsim commented 3 months ago

Hi Shaowen, How many barcodes do you have? Kind Regards, Andre Sim

ShaowenJ commented 3 months ago

Hi Andre,

I have almost 200,000 barcodes. It's similar to single-cell but not the same sequence structure as 10X

Thanks, Shaowen

andredsim commented 3 months ago

Hi Shaowen,

With that amount of barcodes/bam files, using the current version of Bambu, you will encounter some computational resource issues. We are currently working on a way to handle this, preferably in a way where you do not need to subset the bam file. I hope to be able to update you on its progress in the coming month. Are your barcodes stored in the read name/the BC tag in the bam file or only in a seperate metadata file?

Kind Regards, Andre Sim

ShaowenJ commented 3 months ago

Hi Andre,

Thanks, look forward to having some tools that could do this in a more efficient way. unfortunately, the BC is stored in another meta table, that can match the read ID in the bam file. Appreciate if you have any suggestions.

Best, Shaowen Jiang

On Sun, Mar 24, 2024 at 10:26 PM Andre Sim @.***> wrote:

Hi Shaowen,

With that amount of barcodes/bam files, using the current version of Bambu, you will encounter some computational resource issues. We are currently working on a way to handle this, preferably in a way where you do not need to subset the bam file. I hope to be able to update you on its progress in the coming month. Are your barcodes stored in the read name/the BC tag in the bam file or only in a seperate metadata file?

Kind Regards, Andre Sim

— Reply to this email directly, view it on GitHub https://github.com/GoekeLab/bambu/issues/417#issuecomment-2017098299, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHJBWLCSRPN5YUZUXBAD6RDYZ6DLTAVCNFSM6AAAAABFBVXLN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJXGA4TQMRZHE . You are receiving this because you authored the thread.Message ID: @.***>