ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
144 stars 13 forks source link

multiple single cell samples #211

Open wanghlv opened 2 months ago

wanghlv commented 2 months ago

Hi, Thanks for writing such a complete MAN page! I have a quick question, I have a total of 6 samples, and all of them are single cell Nanopore libraries. I'd like both the transcript and gene quantification to be per cell (in the CB tag) and per sample. Could I use --read_group file_name:tag:CB ? or I should supply the file like --read_group file:READ_TO_BARCODE_Samples.TSV:0:2

READ_TO_BARCODE_Samples.TSV, should look like:, so the first column is the READ ID, second is the cell barcodes, and the third is the sample? However, I'm not sure if the read ID is unique across all 6 samples I have.

12a5c9c3-2b73-49c0-a3fd-22d2c10832e2_0 AATCAGGAGTGAACGA Sample1 b6e8c102-e1e2-4155-bc28-7dbb5a34c857_0 CCAGCTGCATGAGCAG Sample2 ...

I'm currently running it as the following: isoquant.py -d ont -r ${FA} --complete_genedb --genedb ${GTF} \ --bam ${s1bam} ${s2bam} ${s3bam} ${s4bam} ${s5bam} ${s6bam} \ -o IQ_all --prefix IQ_all -l s1 s2 s3 s4 s5 s6 \ --sqanti_output --check_canonical --count_exons --bam_tags \ -t 24 --genedb_output \ --model_construction_strategy default_ont \ --report_canonical auto --read_group tag:CB

Or I was thinking to add a new tag into my bam file including both the cellbarcode pending with a sample ID like AATCAGGAGTGAACGAs1, CCAGCTGCATGAGCAGs2, ... However, I haven't found a good way to do that because I have a lot of reads in my entire experiment. Thank you so much for your suggestions

Best, Hsiao-Lin

andrewprzh commented 2 months ago

Dear @wanghlv

Thanks for the feedback!

Could I use --read_group file_name:tag:CB ? or I should supply the file like --read_group file:READ_TO_BARCODE_Samples.TSV:0:2

I think both ways are identical in terms of results, although using read tags may save memory since in this case IsoQuant won't load the entire barcode table into memory.

Unfortunately, current version of IsoQuant can only group counts by one factor at a time, so either the barcode, or the sample. So if you want both, I guess you'll need to perform two runs.

However, I'm not sure if the read ID is unique across all 6 samples I have.

I highly doubt ONT reads can have identical IDs.

Or I was thinking to add a new tag into my bam file including both the cellbarcode pending with a sample ID like AATCAGGAGTGAACGAs1, CCAGCTGCATGAGCAGs2, ... However, I haven't found a good way to do that because I have a lot of reads in my entire experiment. Thank you so much for your suggestions

Adding new tag would require creating a new BAM file, so probably it's easier to create a new TSV table.

P.S. New version 3.4.2 should be more effective in term of RAM consumption, so it's better to update if possible.

Best Andrey

wanghlv commented 2 months ago

Thank you for all the info and suggestions, and yes 3.4.2 is so much better at using RAM!! I'm wondering if you would recommend a efficient cell barcodes and UMI processing tools before using IsoQuant for mapping, for single cell nanopore data. Also, I'm wondering since I have the single cell data with also UMI. How would you factor in the quantifications, properly to avoid double counting PCR duplicates? Thanks so much again Hsiao-Lin

andrewprzh commented 2 months ago

@wanghlv

Currently, I'm using a barcode calling and PCR de-duplication tools of my own (https://github.com/ablab/IsoQuant/tree/sc_v3). They are not released yet, but at some point they will become a part of IsoQuant too. If you eager to test it, contact me via email, please :)

There are also some pipelines available, such as https://github.com/nf-core/scnanoseq (also uses IsoQuant) https://github.com/epi2me-labs/wf-single-cell They also have a list of tools they use for barcode calling / PCR de-duplication. However, I have not tried any of those yet.

Hope that helps.

Best Andrey

vasikara17 commented 1 week ago

Hello, I have a similar issue that I posted yesterday! In my case I have one bam file that contains all the conditions. Could you elaborate on running two times isoquant with different tags? How can I keep the barcode and the condition information? Best, VK

andrewprzh commented 4 days ago

Replied in https://github.com/ablab/IsoQuant/issues/234