Closed Radek91 closed 1 year ago
Dear @Radek91
In fact --stranded
option affect only minimap2 options in IsoQuant, i.e. from src/read_mapper.py:
if args.stranded == 'forward':
additional_options.append('-uf')
Everything else is performed identically for stranded and unstranded reads (at least for now).
At the transcript-level 2 counts are reported regardless of the --stranded option. In this case this is likely because I allow only the quantification with "unique_only".
You are right, it seems like the lower read has fewer exons than the upper ones, which does not allow to identify the original isoform precisely.
Let me know if you have further questions.
Best Andrey
Thank you for the swift answer. Can't the pipeline take into account the strand orientation to assign reads when features are overlapping ? Some genes overlap with antisense lncRNA or retroviruses are expressed from both strand and in some case only the transcript orientation can differentiate two features.
Regarding minimap2, this is quite odd as I ran the pipeline with --strand forward option but minimap2 command displayed -ub. I will double check again to make sure and come back to you if this is confirmed.
Can't the pipeline take into account the strand orientation to assign reads when features are overlapping ?
Technically, if the sequencing is strand-specific, it makes sense to include that in feature assignment. I think we had it our TODO list at some point. We will consider adding it to one of the future releases, thanks for suggesting!
Regarding minimap2, this is quite odd as I ran the pipeline with --strand forward option but minimap2 command displayed -ub. I will double check again to make sure and come back to you if this is confirmed.
Sounds odd. Please, send me the log file if you discover something is off there.
Best Andrey
My bad, minimap2 is working fine with the stranded option!
I would definitely appreciate a stranded option for the feature assignment. Long-reads can solve a lot of overlaps when dealing with --transcript_quantification unique_only but it will miss some which could be rescued when using the read orientation. Good luck with it! (If you need such stranded data please reopen this thread :-) I can share some with you).
@Radek91
Sorry, was out of the office for a while.
Thanks for the offer, I think I have some stranded data, but I will ping you if I need more once I get my hands on it :)
Best Andrey
Hi,
I'm having the same problem. I'm using IsoQuant version 3.3.0. Did you fix this "problem"? Cheers F
@francicco do you mean strand-specific data support? Unfortunately, this one is not coming in the next release.
Best Andrey
Yeah, that's what I meant. Any tips on how to achieve it? I was thinking on trying to asses the strand of the novel genes based on the bed files of the corrected reads. Any advice?
Thanks a lot F
Just to clarify, what exactly would you like to achieve with stranded-specific reads?
By the way, maybe opening a new issue might be informative for other users too.
In the annotation that isoquant returns extended_annotation.gtf
there are novel genes. Like in this case:
in the extended_annotation.gtf
these transcripts have no strand. I need to assign the strand because I'd want to run quanti3
afterward.
F
Ok, I will, thank you! F
Transcripts with no strand will not be printed in the output in the next release - some fixes and improvements were made (see also https://github.com/ablab/IsoQuant/issues/107). Strand for novel transcripts will be identified using canonical splice sites.
However, it will be still possible to output such questionable transcripts for which canonical splice sites did not provide an unambiguous strand (--report_canonical all
option).
To set the strand for these transcripts with . strand you may use *.transcript_model_reads.tsv
and see which reads support this particular transcripts. Further, you can derive strand of these reads from you BAM files and assign it to the transcript. Seems a bit of complex workaround, but I guess it's the only way at the moment.
Best Andrey
Thank you!
Hello!
First of all thank you for designing this tool. So far it has very useful / accurate on my datasets.
I would like to get some clarifications regarding the way isoquant count reads based on their orientation. What is the exact function of the --stranded forward vs --stranded none ?
Looking at the code, it does not appear that isoquant is attempting to map the reads using the -uf flag from minimap2 which would force the alignment to be oriented.
In the presented case, 3 reads are mapping to the ACTB gene. When considering the gene-level, the final gene count is reported as 3, regardless of the --strand forward or none. At the transcript-level 2 counts are reported regardless of the --stranded option. In this case this is likely because I allow only the quantification with "unique_only".
Could you clarify what is the exact function / behaviour of the --stranded flag ?
Thank you in advance!
Additional information:
--read_group file_name --count_exons --clean_start --check_canonical --complete_genedb --transcript_quantification unique_only --gene_quantification with_inconsistent --data_type nanopore --stranded forward