eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
109 stars 26 forks source link

core genes list annotations #97

Closed apoorva004 closed 2 years ago

apoorva004 commented 2 years ago

Hi I have generated the core genes list (pangenome_matrix_t0_core_list) during the genome analysis in my bacterial strains. I would like to if there is any way that I can annotate and classify these genes in functional category. I am interested in functional aspects of these clusters. Thanks.

eead-csic-compbio commented 2 years ago

Hi @apoorva004 , I can see at least two options, which require downloading the PfamA database with script install.pl:

Hope this helps, any other ideas @vinuesa ? Bruno

vinuesa commented 2 years ago

Hi @apoorva004, with the tools currently distributed in the get_homologues distro, following the suggestions by @eead-csic-compbio is the best you can do to get some functional annoation based on the PFAM domain composition. You may want to download the latest COG data from https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/ and run blast against it to obtain the "classic" one-letter functional categories, if that is what you would like.

eead-csic-compbio commented 2 years ago

Just to add to @vinuesa 's suggestion, you could use the file https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/cog-20.fa.gz and blast it against your clusters with script _make_nr_pangenomematrix.pl and option -f, hope this helps, Bruno