ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
153 stars 13 forks source link

isoquant - strandness - clarifications #113

Closed Radek91 closed 1 year ago

Radek91 commented 1 year ago

Hello!

First of all thank you for designing this tool. So far it has very useful / accurate on my datasets.

I would like to get some clarifications regarding the way isoquant count reads based on their orientation. What is the exact function of the --stranded forward vs --stranded none ?

Screenshot 2023-09-19 at 13 10 29

Looking at the code, it does not appear that isoquant is attempting to map the reads using the -uf flag from minimap2 which would force the alignment to be oriented.

In the presented case, 3 reads are mapping to the ACTB gene. When considering the gene-level, the final gene count is reported as 3, regardless of the --strand forward or none. At the transcript-level 2 counts are reported regardless of the --stranded option. In this case this is likely because I allow only the quantification with "unique_only".

Could you clarify what is the exact function / behaviour of the --stranded flag ?

Thank you in advance!

Additional information:

--read_group file_name --count_exons --clean_start --check_canonical --complete_genedb --transcript_quantification unique_only --gene_quantification with_inconsistent --data_type nanopore --stranded forward

andrewprzh commented 1 year ago

Dear @Radek91

In fact --stranded option affect only minimap2 options in IsoQuant, i.e. from src/read_mapper.py:

        if args.stranded == 'forward':
            additional_options.append('-uf')

Everything else is performed identically for stranded and unstranded reads (at least for now).

At the transcript-level 2 counts are reported regardless of the --stranded option. In this case this is likely because I allow only the quantification with "unique_only".

You are right, it seems like the lower read has fewer exons than the upper ones, which does not allow to identify the original isoform precisely.

Let me know if you have further questions.

Best Andrey

Radek91 commented 1 year ago

Thank you for the swift answer. Can't the pipeline take into account the strand orientation to assign reads when features are overlapping ? Some genes overlap with antisense lncRNA or retroviruses are expressed from both strand and in some case only the transcript orientation can differentiate two features.

Regarding minimap2, this is quite odd as I ran the pipeline with --strand forward option but minimap2 command displayed -ub. I will double check again to make sure and come back to you if this is confirmed.

andrewprzh commented 1 year ago

Can't the pipeline take into account the strand orientation to assign reads when features are overlapping ?

Technically, if the sequencing is strand-specific, it makes sense to include that in feature assignment. I think we had it our TODO list at some point. We will consider adding it to one of the future releases, thanks for suggesting!

Regarding minimap2, this is quite odd as I ran the pipeline with --strand forward option but minimap2 command displayed -ub. I will double check again to make sure and come back to you if this is confirmed.

Sounds odd. Please, send me the log file if you discover something is off there.

Best Andrey

Radek91 commented 1 year ago

My bad, minimap2 is working fine with the stranded option!

I would definitely appreciate a stranded option for the feature assignment. Long-reads can solve a lot of overlaps when dealing with --transcript_quantification unique_only but it will miss some which could be rescued when using the read orientation. Good luck with it! (If you need such stranded data please reopen this thread :-) I can share some with you).

andrewprzh commented 1 year ago

@Radek91

Sorry, was out of the office for a while.

Thanks for the offer, I think I have some stranded data, but I will ping you if I need more once I get my hands on it :)

Best Andrey

francicco commented 7 months ago

Hi,

I'm having the same problem. I'm using IsoQuant version 3.3.0. Did you fix this "problem"? Cheers F

andrewprzh commented 7 months ago

@francicco do you mean strand-specific data support? Unfortunately, this one is not coming in the next release.

Best Andrey

francicco commented 7 months ago

Yeah, that's what I meant. Any tips on how to achieve it? I was thinking on trying to asses the strand of the novel genes based on the bed files of the corrected reads. Any advice?

Thanks a lot F

andrewprzh commented 7 months ago

Just to clarify, what exactly would you like to achieve with stranded-specific reads?

andrewprzh commented 7 months ago

By the way, maybe opening a new issue might be informative for other users too.

francicco commented 7 months ago

In the annotation that isoquant returns extended_annotation.gtf there are novel genes. Like in this case:

Screenshot 2024-04-24 at 5 51 03 PM

in the extended_annotation.gtf these transcripts have no strand. I need to assign the strand because I'd want to run quanti3 afterward.

F

francicco commented 7 months ago

Ok, I will, thank you! F

andrewprzh commented 7 months ago

Transcripts with no strand will not be printed in the output in the next release - some fixes and improvements were made (see also https://github.com/ablab/IsoQuant/issues/107). Strand for novel transcripts will be identified using canonical splice sites.

However, it will be still possible to output such questionable transcripts for which canonical splice sites did not provide an unambiguous strand (--report_canonical all option).

To set the strand for these transcripts with . strand you may use *.transcript_model_reads.tsv and see which reads support this particular transcripts. Further, you can derive strand of these reads from you BAM files and assign it to the transcript. Seems a bit of complex workaround, but I guess it's the only way at the moment.

Best Andrey

francicco commented 6 months ago

Thank you!