genome-wide sRNA target prediction

StephenLi55 commented 4 years ago

Hi,

I am trying to do a genome-wide target prediction for my sRNA sequence, by using the whole genome sequence as my target sequence. But I feel like it may not be the best approach as it is taking a long time to run, and also using the whole genome sequence is likely to perturb the predicted secondary structure of individual RNAs. Do you have other suggestions on running a genome-wide target prediction ?

The code I use is as followed :

"~/miniconda3/bin/IntaRNA -q [the fasta file of my sRNA] -t [the fasta file of the whole genome sequence] --personality=IntaRNAsTar --threads 4 -n 100 --outMode C --out [the name of my output file]"

best wishes,

Stephen

martin-raden commented 4 years ago

Hi Stephen,

generally: if you have very long sequences, you might want to relate to window-based computation, since the memory footprint will be much lower...

typically (afaik), sRNAs are interacting with mRNAs. thus, scanning against the whole genome sequence is most likely overkill.. that's why, eg the IntaRNA webserver only runs predictions around the start codon of each gene, considering genomic subsequence of eg from 200 up- to 100 downstream of the annotated start codon.

if you can extract the correct mRNA transcripts (ie you know transcription start and end for each gene), that would be the most valuable target data in my opinion. that way, you will not overestimate the 5' end of the mRNA, as it might happen when just taking 200 upstream of the start codon..

Does this already help?

Best, Martin

StephenLi55 commented 4 years ago

Hi Martin,

That's really helpful. Thanks for the advice.

best wishes,

Stephen

From: Martin Raden notifications@github.com Sent: Monday, June 15, 2020 9:13 AM To: BackofenLab/IntaRNA IntaRNA@noreply.github.com Cc: Li, Stephen S.Li.55@warwick.ac.uk; Author author@noreply.github.com Subject: Re: [BackofenLab/IntaRNA] genome-wide sRNA target prediction (#187)

Hi Stephen,

generally: if you have very long sequences, you might want to relate to window-basedhttps://github.com/BackofenLab/IntaRNA#predWindowBased computation, since the memory footprint will be much lower...

typically (afaik), sRNAs are interacting with mRNAs. thus, scanning against the whole genome sequence is most likely overkill.. that's why, eg the IntaRNA webserver only runs predictions around the start codon of each gene, considering genomic subsequence of eg from 200 up- to 100 downstream of the annotated start codon.

if you can extract the correct mRNA transcripts (ie you know transcription start and end for each gene), that would be the most valuable target data in my opinion. that way, you will not overestimate the 5' end of the mRNA, as it might happen when just taking 200 upstream of the start codon..

Does this already help?

Best, Martin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/BackofenLab/IntaRNA/issues/187#issuecomment-643975453, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMSGV2QNCCRNEJGED3U3QODRWXJZRANCNFSM4N3HYLDQ.

martin-raden commented 4 years ago

@StephenLi55 feel free to reopen the issue if you have further questions..

BackofenLab / IntaRNA

genome-wide sRNA target prediction #187