ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
754 stars 136 forks source link

Can rnaSPAdes count the numbers of genes and transcripts in the RNA-seq analysis like the TrinityStats.py script in Trinity software? #1221

Open luwd401 opened 11 months ago

luwd401 commented 11 months ago

Is your feature request related to a problem? Please describe. For generic questions use Q&A section in the Discussions forum above.

Recently, I found that rnaSPAdes is superior in some aspects of transcript assembly that Trinity. But I also met another problem. That is, I could not determine the number of genes and transcripts in the assembled transcripts made by rnaSPAdes. Can rnaSPAdes count the numbers of genes and transcripts in the RNA-seq analysis like the TrinityStats.py script in Trinity software?

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

andrewprzh commented 11 months ago

Dear @luwd401

rnaSPAdes does not output that explicitly, but it's possible to deduce these numbers from the output FASTA. Each contig has the following id format: NODE_97_length_6237_cov_11.9819_g8_i2, where g8 means gene number 8, and i2 means isoform number 2 within this gene. You can find more information in rnaSPAdes manual.

We will add explicit reporting of these numbers in the next release, thank you for the suggestion!

Best Andrey

luwd401 commented 11 months ago

Dear Andrey Prjibelski, Thank you for your kindly reply, and we are looking forward to the new release of rnaSPAdes. Good luck!

Best wishes, Wei-Dong Lu

--------------原始邮件-------------- 发件人:"Andrey Prjibelski @.>; 发送时间:2023年12月8日(星期五) 上午8:04 收件人:"ablab/spades" @.>; 抄送:"weidonglu @.>;"Mention @.>; 主题:Re: [ablab/spades] Can rnaSPAdes count the numbers of genes and transcripts in the RNA-seq analysis like the TrinityStats.py script in Trinity software? (Issue #1221)

Dear @luwd401

rnaSPAdes does not output that explicitly, but it's possible to deduce these numbers from the output FASTA. Each contig has the following id format: NODE_97_length_6237_cov_11.9819_g8_i2, where g8 means gene number 8, and i2 means isoform number 2 within this gene.

We will add explicit reporting of these numbers in the next release, thank you for the suggestion!

Best Andrey

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>