JieZheng-ShanghaiTech / SL_benchmark

Benchmarking study of machine learning methods for prediction of synthetic lethality
MIT License
10 stars 1 forks source link

For anything other than genes in KG, how do I match ID and symbol? #1

Open sheunbaek opened 10 months ago

sheunbaek commented 10 months ago

First of all, thank you for organizing the synthetic reliability prediction task well. Thanks to you, I got better at researching this field :)

I was using a knowledge graph made by your lab called SynlethDB, but there was a difference compared to the knowledge graph provided here. I have two questions in this regard.

  1. May I know what the difference is?
  2. The only KG provided by GitHub is the id, can you tell me the name (symbol) that can be mapped with this? Here's the gene mapping file, but there's nothing else.

Thank you.

FEEEENGYM commented 10 months ago

Thank you for raising your questions about our work. I'd like to explain the differences you've noticed and address your queries as follows:

  1. Data Processing in SynlethDB:

    • In the SynlethDB’s KG, some genes are associated exclusively with SL, SR, or non-SL relations.
    • To prevent data leakage when using the KG, we removed the SL, SR, and non-SL relations from it. This removal affected genes that only had SL, SR, or non-SL relations, making it impossible to generate embeddings for them using the TransE algorithm. Consequently, these genes were omitted from the training data.
    • For consistency across all models, we identified a set of 9,845 genes that with relations other than SL. All SL data used in the benchmarking are based on the interactions among these 9,845 genes.
  2. Symbol Mapping and Identification: We have uploaded a new file to GitHub, which includes comprehensive details of all entities in the knowledge graph (/data/fin_entities.csv). This file contains the IDs, names, and types of all entities, enabling you to map the gene IDs to their corresponding names (symbols) more effectively.