EBISPOT / scxa_2_cxg

Apache License 2.0
1 stars 0 forks source link

Specify Schema for mapping genes between species #4

Open dosumis opened 5 months ago

dosumis commented 5 months ago

@YY-SONG0718 to provide details

YY-SONG0718 commented 5 months ago

We use the gene orthology from the EggNOG v6.0 database.

The database contains information from 12,535 species and 17M orthologous groups (OGs). Genes that belong to the same OG are homologous. They first used sequence clustering to calculate OGs at specific taxonomy levels, then applied phylogenetic reconstruction to distinguish orthologs and in-paralogs within each OG. They also provide functional annotations, such as peptide domains and pathways for each OG.

We created nodes for the KG to represent genes and OGs at different taxonomy levels. We created the following (minimal) edges to store gene homology mapping:

(:Gene)-[:GeneInOrthologousGroup]->(:OrthologousGroup) (:Gene)-[:GeneFromSpecies]->(:Species)

The hierarchical relationship between OGs can also be included:

(:OrthologousGroup)-[:IsParentOrthologousGroup]->(:OrthologousGroup)

In practice, the following files from the download page contain the respective information:

Depending on the gene ID of choice in the KG, one might need to convert the ENSEMBL_peptide_id to the respective gene ID. I used ENSEMBL gene IDs and did the conversion before building the graph, but it is also possible to include the gene-to-peptide relationships.