gatech-genemark / ProtHint

Protein hint generation pipeline for gene finding in eukaryotic genomes
Other
56 stars 13 forks source link

Which database should I use #33

Closed ld9866 closed 3 years ago

ld9866 commented 3 years ago

Hello! We are now doing genome annotation analysis on a domestic animal. Do I use the protein sequences of its near source species for analysis, or do I use the Vertebrata database we now provide for prediction?

tomasbruna commented 3 years ago

Hello @ld9866 ,

sorry for the late response. If you are still deciding, I'd recommend using OrthoDB protein database of all Vertebrata, ProtHint will automatically find the closest species in the database. More instructions on this are here https://github.com/gatech-genemark/ProtHint#protein-database-preparation.

Moreover, if you have some extra proteins of close species which are not in the OrthoDB database, feel free to add them to the input protein fasta file.

Tomas

ld9866 commented 3 years ago

Thank you very much for your suggestion. I will try the method you mentioned.  Thank you!