bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
92 stars 7 forks source link

questions about how to get genes from the output #53

Open alexyfyf opened 1 year ago

alexyfyf commented 1 year ago

Please report

Hi Ka Ming,

I'm using RNA-bloom2 to assemble long-read cDNA RNA-seq data. I have a question about the output. I can see the transcripts.fa files have the sequences for each transcripts, but how can I know which transcripts are from the same gene? I don't see that information contained in the header. Some example headers are shown here:

>rb_90719 l=1982 c=0.25546062 path=[94775+,95098+]
>rb_90720 l=407 c=0.21744472 s=103012

Also, I'm not sure why some header show s while others show path, any difference?

Thank you so much if you could help to explain it.

Cheers, Alex

kmnip commented 1 year ago

There is no inference about genes.

path indicates that it was assembled from the list of sequences from the previous step of the assembly. s indicates that it originate from a single sequence.

alexyfyf commented 1 year ago

Thank you so much for your reply. Are there any suggestions on how to infer genes from RNA-bloom2 output from your experience?

Cheers, Alex

kmnip commented 1 year ago

You can possibly try this: http://arthropods.eugenes.org/EvidentialGene/other/sra2genes_testdrive/sra2genes4v_testdrive/

If you are interested in a crude gene groupings of assembled transcripts, I can make it a feature request (but very low priority).

alexyfyf commented 1 year ago

Thank you so much. Would definitely like to have this feature in the future.