Closed apoorva004 closed 2 years ago
Hi @apoorva004 , I can see at least two options, which require downloading the PfamA database with script install.pl:
You can annotate Pfam protein domains of selected clusters with _annotatecluster.pl as explained on section 4.7 Annotating a sequence cluster in the get_homologues-est manual
If you re-run get_homologues.pl with option -D Pfam domains will be called in all input sequences and thus you can compute functional enrichment of gene sets as explained in section 4.9.5 Calculating Pfam enrichment of cluster sets. In my experience the core set might not be particularly enriched, but there are usually significantly increased or reduced numbers of selected Pfam domains in accessory sets.
Hope this helps, any other ideas @vinuesa ? Bruno
Hi @apoorva004, with the tools currently distributed in the get_homologues distro, following the suggestions by @eead-csic-compbio is the best you can do to get some functional annoation based on the PFAM domain composition. You may want to download the latest COG data from https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/ and run blast against it to obtain the "classic" one-letter functional categories, if that is what you would like.
Just to add to @vinuesa 's suggestion, you could use the file https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/cog-20.fa.gz and blast it against your clusters with script _make_nr_pangenomematrix.pl and option -f, hope this helps, Bruno
Hi I have generated the core genes list (pangenome_matrix_t0_core_list) during the genome analysis in my bacterial strains. I would like to if there is any way that I can annotate and classify these genes in functional category. I am interested in functional aspects of these clusters. Thanks.