Is it possible to obtain the cds sequence?

apcamargo / RNAsamba

A tool for computing the coding potential of RNA transcript sequences using deep learning classification model

https://rnasamba.lge.ibi.unicamp.br/

GNU General Public License v3.0

53 stars 4 forks source link

Is it possible to obtain the cds sequence? #21

Open Tang-pro opened 8 months ago

Tang-pro commented 8 months ago

Hi, @apcamargo

Hello, the results I ran here only have protein sequences and no cds sequences. Can this software get the corresponding cds sequence codes?

Tang-pro commented 8 months ago

Hello, @apcamargo

Here I got the protein sequence obtained by Rnasamba, but there is no cds sequence. I used TransDecoder to predict the ORF, and used the protein sequence obtained by Rnasamba as the BLASTP database, integrated it into and selected the most likely CDS from it. Is this method feasible?

Looking forward to your reply, Thank you!

apcamargo commented 8 months ago

Yes. That makes sense. RNAsamba just takes the longest CDS in the transcript. Trnadecodrr will give you good results

Tang-pro commented 8 months ago

Hi, @apcamargo Excuse me again Through this method, the original protein sequences predicted by RNAsamba were 174,084, but the cds sequences obtained through TransDecoder were only 171,513. How should we understand this? Is it feasible?

apcamargo commented 8 months ago

This could be because RNAsamba is good in identifying truncated transcripts, which might not appear if you require complete ORFs in transdecodor. Another option is that transdecodor is applying a couple of filters that are removing a couple of ORFs.

If you just want ORFs for these transcripts, you can use a ORF extractor tool, such as OrfM or seqkit.

Tang-pro commented 8 months ago

Hi, @apcamargo

I got it, Thank you!