apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Feature request: GBK/GFF as input #60

Open alexweisberg opened 6 months ago

alexweisberg commented 6 months ago

Hi, geNomad is a great tool and is very useful for my research. I was wondering if it would be possible to modify it to take GBK or GFF3 format files as input rather than just fasta nucleotide? Such as those produced by bakta or prokka. This would save time in running it, and would make it easier to interpret output of geNomad (gene IDs, etc.) relative to the previously annotated genome. Thanks!

apcamargo commented 6 months ago

Thanks, @alexweisberg!

I agree with you that making geNomad able to parse the outputs of bakta/prokka would be very useful. Unfortunately, geNomad requires some information that is present in the output of prodigal-gv and I can't really get from bakta/prokka (e.g. the RBS motifs). A solution would be to train an additional model that would not use those features, but that would require a major rework of the software.

Do you think you could map geNomad's gene calls to Bakta's?

alexweisberg commented 6 months ago

I see, that makes sense.

I could map it afterwards using something like bedtools. That would most likely work as long as the gene calls were similar. On Dec 20, 2023 at 1:53 PM -0800, Antônio Camargo @.***>, wrote:

Thanks, @alexweisberg! I agree with you that making geNomad able to parse the outputs of bakta/prokka would be very useful. Unfortunately, geNomad requires some information that is present in the output of prodigal-gv and I can't really get from bakta/prokka (e.g. the RBS motifs). A solution would be to train an additional model that would not use those features, but that would require a major rework of the software. Do you think you could map geNomad's gene calls to Bakta's? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>