FireLabSoftware / ScanRabbit

ScanRabbit-- Assembly-indepdendent filtering of NGS Datasets for a well defined protein motif
MIT License
0 stars 0 forks source link

anticodon search in rnaseq #1

Open igortru opened 19 hours ago

igortru commented 19 hours ago

just curious, is it possible implement something similar for trna search? my interest : “missing” trna anticodons. see https://pmc.ncbi.nlm.nih.gov/articles/PMC8007984/

FireLabSoftware commented 11 hours ago

This is a really nice idea but would certainly take some knowledge of tRNA scanning to implement.

For background--- Scan Rabbit is not a particularly sophisticated tool and doesn't have a particularly sophisticated search and evaluation routine for protein coding homolgy (just looking at coherence with a supplied multiple sequence alignment in a defined region). There are much more sophisticated programs such as Hmmer (Sean Eddy)-- but we built our own to enable the use of simple python tools to allow processing (and in some cases rejection) for a broad set of input files to be searched and summarized in a single output document.

From what I have heard at many Bay Area RNA club meetings, searching for tRNAs is a much more arduous task that searching for proteins that match a HMM model or MSA. If one had nice code for the latter, though, the flexibility of python should allow that code to be dropped into ScanRabbit to take advantage of ScanRabbit's ability to deal with large numbers of sometimes-imperfect input SRA files. The folks who have done things like this are Sean and Todd Lowe (UCSC)-- they would likely have some good ideas and might already have built this.

Best Regards, Andy Fire (aka the Scan Rabbit Rabbit Scanner)

On Oct 30, 2024, at 11:34, igortru @.***> wrote:

just curious, is it possible implement something similar for trna search? my interest : “missing” trna anticodons. see https://pmc.ncbi.nlm.nih.gov/articles/PMC8007984/

— Reply to this email directly, view it on GitHubhttps://github.com/FireLabSoftware/ScanRabbit/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACOED6JDJ4MBQQWPIFBD2C3Z6ERD3AVCNFSM6AAAAABQ4X6ORWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDIOJYHE4TIOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

igortru commented 8 hours ago

Dear Andy,

I recalled my earlier work on the “missing anticodons” issue when I read your recent publication on Obelisks (rabbit tool). Back then, I gathered extensive statistics on tRNA anticodon distributions across all prokaryotic genomes available at the time. My findings showed that these anticodons aren’t entirely missing—they’re simply much less frequent, often by a factor of 100 or more (though I don’t recall the exact numbers now).

From what I observed, this scarcity likely involves more than just assembly errors; something more complex seems to be at play. Tools like tRNAscan-SE report anticodons directly from RNA sequences and don’t flag cases where structural trna inconsistencies might point to discrepancies with observed anticodons. Identifying these requires additional scrutiny.

Yuri Wolf suggested examining the neighboring regions of these tRNAs, as they might have conserved flanking regions that could offer insights into why some anticodons are so rare. I’m also considering using RNA-seq to assess their distribution after transcription. Do they maintain the same ratio? It’s possible that certain tRNAs are modified during transcription to a canonical form or that other “late” modifications contribute to this phenomenon. Probably, I need communicate with trnascan-se authors directly.

I also spoke today with Alexander Souvorov see his work here: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04174-9 about the potential to adapt his tool for more efficient detection in rnaseq rare gene variants. Potentially, his tool can identify genes with a single nucleotide mutation in raw reads, which could be highly valuable for this type of studies, with tRNAs as a test case. another obstacle : NCBI don't instantiate rna sequences for prokaryots (see ipg reports for proteins) , full genbank scan required , for now - it is more than million assemblies.

Best regards, Igor Tolstoy