RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Possibility to move ARAX ranker implementation to the end after we clear all virtual edges in results. #2375

Closed chunyuma closed 1 month ago

chunyuma commented 1 month ago

Currently, the response inputs to ARAX ranker in ARAX_query.py contain "virtual` edge bindings generated by xDTD or NGD like:

{'analyses': [{'attributes': None,
               'edge_bindings': {'N1': [{'attributes': [], 'id': 'N1_487'}],
                                 'creative_DTD_qedge_0': [{'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:28487--infores:drugbank'},
                                                          ...],
                                 'creative_DTD_qedge_1': [{'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:directly_physically_interacts_with--None--None--None--NCBIGene:2566--infores:drugbank'},
                                                          ...],
                                 'creative_DTD_qedge_2': [{'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:2741--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          ...],
                                 't_edge': [{'attributes': [],
                                             'id': 'infores:molepro:CHEBI:31515--biolink:treats--None--None--None--MONDO:0007186--infores:ctd'},
                                            {'attributes': [],
                                             'id': 'creative_DTD_prediction_8'}]},
               'resource_id': 'infores:arax',
               'score': None,
               'scoring_method': None,
               'support_graphs': None}],

However, I noticed that these "virtual" edge bindings would be finally removed in the final response:

{'analyses': [{'attributes': None,
               'edge_bindings': {'t_edge': [{'attributes': [],
                                             'id': 'creative_DTD_prediction_8'},
                                            {'attributes': [],
                                             'id': 'infores:molepro:CHEBI:31515--biolink:treats--None--None--None--MONDO:0007186--infores:ctd'}]},
               'resource_id': 'infores:arax',
               'score': 0.994,
               'scoring_method': None,
               'support_graphs': ['aux_graph_N1_564']}]

I am wondering if we can move the implementation of ARAX ranker to a different location in ARAX_query.py where the "virtual" edge bindings have been removed? Any thoughts on this? @amykglen

dkoslicki commented 1 month ago

Historically, the NGD and other virtual edges have helped the ranking. Do you know if result "quality" improves or not if you were to do the ranking after the virtual edges have been removed?

chunyuma commented 1 month ago

@dkoslicki, please see my answers to your concern below.

Historically, the NGD and other virtual edges have helped the ranking.

I didn't negate the function of NGD or other virtual edges in ranking. My goal is not to ignore them in the ranking. But some of these "virtual" edges might not be used as returned results in the UI but probably as a support graph or the like. So when I said "clear all virtual edges in results", it doesn't mean to really clear them but they are probably moved to other visualizations in the UI.

Here is an example:

Query

{
  "edges": {
    "t_edge": {
      "attribute_constraints": [],
      "knowledge_type": "inferred",
      "object": "ON",
      "predicates": [
        "biolink:treats"
      ],
      "qualifier_constraints": [],
      "subject": "SN"
    }
  },
  "nodes": {
    "ON": {
      "categories": [
        "biolink:Disease"
      ],
      "constraints": [],
      "ids": [
        "MONDO:0007186"
      ],
      "is_set": false,
      "set_interpretation": "BATCH"
    },
    "SN": {
      "categories": [
        "biolink:ChemicalEntity"
      ],
      "constraints": [],
      "is_set": false,
      "set_interpretation": "BATCH"
    }
  }
}

Please see the UI returned result: https://arax.ncats.io/?r=e4274554-09e0-4188-a325-376b2ae295ee.

For the result 5 domperidone, as you can see, there are only two supportive edges.

However, when you ran this query in ARAX locally, the input response to ARAX ranker contains much more edge bindings than those two edges in the UI:

{'analyses': [{'attributes': None,
               'edge_bindings': {'N1': [{'attributes': [], 'id': 'N1_36'}],
                                 'creative_DTD_qedge_0': [{'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:28487--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--PUBCHEM.COMPOUND:24872560--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:15765--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:affects--None--None--None--NCBIGene:1813--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--UNII:B72HH48FLU--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:affects--None--None--None--NCBIGene:3757--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:affects--None--None--None--NCBIGene:1813--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:10102--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--PUBCHEM.COMPOUND:9852188--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:31413--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:85966--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:affects--None--None--None--NCBIGene:1813--infores:drugcentral'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:affects--None--None--None--NCBIGene:1565--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:affects--None--None--None--NCBIGene:3757--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:5801--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:9671--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:18243--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:15765--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31515--biolink:interacts_with--None--None--None--CHEBI:63613--infores:drugbank'}],
                                 'creative_DTD_qedge_1': [{'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2561--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1813--biolink:physically_interacts_with--None--None--None--NCBIGene:6531--infores:intact'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:PUBCHEM.COMPOUND:9852188--biolink:physically_interacts_with--None--None--None--NCBIGene:2741--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2561--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:18243--biolink:affects--None--None--None--NCBIGene:6531--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2563--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1813--biolink:gene_associated_with_condition--None--None--None--MONDO:0011122--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1565--biolink:affects--None--None--None--NCBIGene:1557--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:5801--biolink:affects--None--None--None--NCBIGene:5743--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2566--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:PUBCHEM.COMPOUND:9852188--biolink:physically_interacts_with--None--None--None--NCBIGene:7442--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1565--biolink:gene_associated_with_condition--None--None--None--MONDO:0006896--infores:diseases'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:15765--biolink:affects--None--None--None--NCBIGene:1312--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:9671--biolink:affects--None--None--None--NCBIGene:6339--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:28487--biolink:affects--None--None--None--NCBIGene:5743--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:10102--biolink:affects--None--None--None--NCBIGene:2561--infores:drugcentral'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:85966--biolink:affects--None--None--None--NCBIGene:348980--infores:drugcentral'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:PUBCHEM.COMPOUND:24872560--biolink:affects--None--None--None--NCBIGene:2561--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:63613--biolink:affects--None--None--None--NCBIGene:338--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:85966--biolink:affects--None--None--None--NCBIGene:348980--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2566--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1565--biolink:interacts_with--None--None--None--NCBIGene:1557--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:3757--biolink:colocalizes_with--None--None--None--NCBIGene:857--infores:intact'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:3757--biolink:gene_associated_with_condition--None--None--None--MONDO:0011122--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2561--infores:drugcentral'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:10102--biolink:affects--None--None--None--NCBIGene:2566--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:10102--biolink:affects--None--None--None--NCBIGene:2566--infores:drugcentral'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:15765--biolink:affects--None--None--None--NCBIGene:7442--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:18243--biolink:interacts_with--None--None--None--NCBIGene:6531--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:15765--biolink:affects--None--None--None--NCBIGene:1312--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:PUBCHEM.COMPOUND:9852188--biolink:affects--None--None--None--NCBIGene:5743--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2566--infores:drugcentral'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:directly_physically_interacts_with--None--None--None--NCBIGene:2566--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:9671--biolink:affects--None--None--None--NCBIGene:6339--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:UNII:B72HH48FLU--biolink:preventative_for_condition--None--None--None--NCIT:C53458--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:31413--biolink:affects--None--None--None--NCBIGene:2563--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:UNII:B72HH48FLU--biolink:affects--None--None--None--NCBIGene:3123--infores:dgidb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:18243--biolink:affects--None--None--None--NCBIGene:6531--infores:drugbank'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:CHEBI:15765--biolink:affects--None--None--None--NCBIGene:6531--infores:dgidb'}],
                                 'creative_DTD_qedge_2': [{'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:6339--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:2563--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:MONDO:0006896--biolink:subclass_of--None--None--None--MONDO:0007186--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:6531--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:2741--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:348980--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:3123--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:5743--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1557--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:2561--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:2566--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCIT:C53458--biolink:associated_with--None--None--None--MONDO:0007186--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:1312--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:7442--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:MONDO:0011122--biolink:affects--None--None--None--MONDO:0007186--infores:semmeddb'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:857--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'},
                                                          {'attributes': [],
                                                           'id': 'infores:rtx-kg2:NCBIGene:338--biolink:gene_associated_with_condition--None--None--None--MONDO:0007186--infores:disgenet'}],
                                 't_edge': [{'attributes': [],
                                             'id': 'infores:molepro:CHEBI:31515--biolink:treats--None--None--None--MONDO:0007186--infores:ctd'},
                                            {'attributes': [],
                                             'id': 'creative_DTD_prediction_8'}]},
               'resource_id': 'infores:arax',
               'score': None,
               'scoring_method': None,
               'support_graphs': None}],
 'confidence': None,
 'description': 'No description available',
 'essence': 'domperidone',
 ...

But the final output response of ARAX doesn't contain so much edge bindings;

{'analyses': [{'attributes': None,
               'edge_bindings': {'t_edge': [{'attributes': [],
                                             'id': 'infores:molepro:CHEBI:31515--biolink:treats--None--None--None--MONDO:0007186--infores:ctd'},
                                            {'attributes': [],
                                             'id': 'creative_DTD_prediction_8'}]},
               'resource_id': 'infores:arax',
               'score': 0.994,
               'scoring_method': None,
               'support_graphs': ['aux_graph_N1_36']}],
 'confidence': None,
 'description': 'No description available',
 'essence': 'domperidone',
...

These two edge bindings with label t_edge are consistent with what was shown in the UI. As such, those other "virtual" edge bindings would affect the ranking (even though I don't sure if historically they are used in the ranking).

chunyuma commented 1 month ago

Do you know if result "quality" improves or not if you were to do the ranking after the virtual edges have been removed?

Please see the comparison below:

Query used what I proposed above. Our targets are pantoprazole (asset 621), esomeprazole (asset 620). Just used these two targets as examples although there are more.

Ranking results of response 1 (without removing those "virtual" edge bindings, show top 60 only):

[0, 'cisapride', None] [1, 'metoclopramide', None] [2, 'famotidine', None] [3, 'domperidone', None] [4, 'baclofen zwitterion', None] [5, 'cimetidine', None] [6, 'bethanechol', None] [7, 'dexlansoprazole', None] [8, 'citric acid', None] [9, 'aluminum hydroxide', None] [10, 'misoprostol', None] [11, 'potassium bicarbonate', None] [12, 'isosorbide dinitrate', None] [13, 'budesonide', None] [14, 'magnesium carbonate', None] [15, 'bleomycin a2', None] [16, 'aluminum magnesium silicate', None] [17, 'isosorbide', None] [18, 'dopamine', None] [19, '9-cis-retinoic acid', None] [20, 'docetaxel anhydrous', None] [21, 'n,n-dimethylethanolamine', None] [22, 'interferon gamma-1b', None] [23, 'phendimetrazine', None] [24, 'sodium feredetate', None] [25, 'proquazone', None] [26, 'roflumilast', None] [27, 'magnesium lactate', None] [28, 'interferon alfa-2a', None] [29, 'argipressin', None] [30, 'fish oil', None] [31, 'pipemidic acid', None] [32, 'lithium orotate', None] [33, 'human cytomegalovirus immune globulin', None] [34, 'aminosalicylic sodium', None] [35, 'enoxacin', None] [36, 'bupranolol', None] [37, 'iloprost', None] [38, 'sisomycin', None] [39, 'capecitabine', None] [40, 'iotalamic acid', None] [41, 'oxprenolol hydrochloride', None] [42, 'factor viia', None] [43, 'ferumoxides', None] [44, 'demeclocycline', None] [45, '(s)-amphetamine', None] [46, 'lisdexamfetamine', None] [47, 'diethylpropion', None] [48, '(r,r)-labetalol', None] [49, 'testosterone cypionate', None] [50, 'chembl1201055', None] [51, 'interferon alfacon-1', None] [52, 'triamterene', None] [53, 'reserpine', None] [54, '5-fluorouracil', None] [55, '5-formyltetrahydrofolic acid', None] [56, 'esomeprazole', None] [57, 'pantoprazole', None] [58, 'lansoprazole', None] [59, 'rabeprazole', None] [60, 'ranitidine', None] ...

I have highlighted the positions of the targets. As you can see, they rank low. Also, the non-target domperidone rank 4.

Ranking results of response 2 (without those "virtual" edge bindings, show top 60 only):

[0, 'cisapride', None] [1, 'metoclopramide', None] [2, 'famotidine', None] [3, 'cimetidine', None] [4, 'dexlansoprazole', None] [5, 'esomeprazole', None] [6, 'pantoprazole', None] [7, 'lansoprazole', None] [8, 'rabeprazole', None] [9, 'ranitidine', None] [10, 'nizatidine', None] [11, 'dexrabeprazole', None] [12, 'aluminum hydroxide', None] [13, 'ilaprazole', None] [14, 'silicic acid', None] [15, 'citric acid', None] [16, 'magnesium carbonate', None] [17, 'potassium bicarbonate', None] [18, 'aluminum magnesium silicate', None] [19, 'roxane', None] [20, 'magnesium hydroxide', None] [21, 'magnesium orthosilicate', None] [22, 'baclofen zwitterion', None] [23, 'carbaldrate', None] [24, 'aluminum sulfate', None] [25, 'aluminum carbonate', None] [26, 'pe-nme2(14:1(9z)/22:2(13z,16z))', None] [27, 'bethanechol', None] [28, 'misoprostol', None] [29, 'acetylsalicylic acid', None] [30, 'carafate', None] [31, 'erythromycin ethylsuccinate', None] [32, 'nitroglycerin', None] [33, 'progesterone', None] [34, 'citalopram', None] [35, 'domperidone', None] [36, 'vonoprazan', None] [37, 'alginic acid', None] [38, '4-amino-5-chloro-2-ethoxy-n-({4-[(4-fluorophenyl)methyl]morpholin-2-yl}methyl)benzamide', None] [39, 'tegoprazan', None] [40, 'sodium bicarbonate', None] [41, 'n-[[4-[2-(dimethylamino)ethoxy]phenyl]methyl]-3,4-dimethoxybenzamide', None] [42, 'technetium atom', None] [43, 'prucalopride', None] [44, 'cinitapride', None] [45, 'rebamipide', None] [46, '4-amino-5-bromo-n-[2-(diethylamino)ethyl]-2-methoxybenzamide', None] [47, 'carbenoxolone', None] [48, 'pirenzepine', None] [49, 'rennie', None] [50, 'gastrocote', None] [51, '(s)-mosapride', None] [52, 'pantoprazole sodium', None] [53, '2-[(r)-(4-methoxy-3-methylpyridin-2-yl)methylsulfinyl]-6-pyrrol-1-yl-1h-benzimidazole', None] [54, 'tritec', None] [55, 'milk protein', None] [56, 'esomeprazole sodium', None] [57, 'esomeprazole magnesium', None] [58, '(6s,7s,8s)-8-(hydroxymethyl)-7-[4-[2-(3-methoxyphenyl)ethynyl]phenyl]-2-oxo-n-propyl-1,4-diazabicyclo[4.2.0]octane-4-carboxamide', None] [59, '3,4,5-trimethoxy-n-[(s)-3-piperidinyl]benzamide', None] [60, '(r)-(+)-pantoprazole', None] ...

See the improvement in the ranking after "remove" the virtual edge bindings.

chunyuma commented 1 month ago

This is why I propose to move the implementation of ARAX ranker to a different location in ARAX_query.py where the "virtual" edge bindings have been removed if it doesn't affect other modules. Hope this makes sense.

amykglen commented 1 month ago

huh, I was under the impression that the Ranker doesn't consider 'support graphs', so if the NGD edges are already in support graphs when the Ranker is run, then I don't think it's considering them?

but if ignoring them is resulting in a better ranking, do we really want them anyway? or is it just that ignoring the creative DTD support edges (but not NGD edges) results in a better ranking?

it sounds like we need to figure out which edges the ranker is actually considering, and potentially revise that...

chunyuma commented 1 month ago

@amykglen, I think the idea here is:

In the query:

{
  "edges": {
    "t_edge": {
      "attribute_constraints": [],
      "knowledge_type": "inferred",
      "object": "ON",
      "predicates": [
        "biolink:treats"
      ],
      "qualifier_constraints": [],
      "subject": "SN"
    }
  },
  "nodes": {
    "ON": {
      "categories": [
        "biolink:Disease"
      ],
      "constraints": [],
      "ids": [
        "MONDO:0007186"
      ],
      "is_set": false,
      "set_interpretation": "BATCH"
    },
    "SN": {
      "categories": [
        "biolink:ChemicalEntity"
      ],
      "constraints": [],
      "is_set": false,
      "set_interpretation": "BATCH"
    }
  }
}

We are only interested in the edges with label t_edge. So the edge bindings with other labels like N1 or creative_DTD_qedge_* will affect the ranking results. However, since these label names can be defined by users, using the name match to remove those edge bindings within the Ranker is not efficient.

So I proposed two solutions:

  1. move ARAX ranker implementation to the end after we clear all virtual edges in results.
  2. have a better function (I am not sure if you already have such a function somewhere in the ARAX module) to identify the edge bindings with other labels within the Ranker.

Which one do you think will cause the minor influence to other modules? @amykglen

chunyuma commented 1 month ago

Actually, we are not really ignoring creative DTD or NGD. In the response 2, as you can see, the final edge bindings also contain the xDTD edge. So, my thought is to just use the edges with the specified label (e.g., t_edge in this case) in the query to rank the results only?

amykglen commented 1 month ago

hmm, I don't see how the Ranker wouldn't be ignoring NGD edges, since they're not in the actual result graph at that point (they're in support graphs) - does the Ranker consider support graphs? I didn't think so, unless it's been updated to do that..

it makes sense that the main xDTD edge appears in the result because it isn't actually 'virtual' (meaning, it is bound to an edge in the original query graph), but this isn't true for NGD or other Overlay edges like FET

I'm thinking maybe we will want to exclude all of the XDTD support edges (whose query edge's are named in a computational/predictable fashion, I think? i.e., creative_DTD_qedge_X) from the ranker, but leave the other Overlay virtual edges (which are intended to help with ranking)?

by the way, the tucking of the 'virtual' edges into support graphs (which is required by TRAPI) happens in this module: https://github.com/RTXteam/RTX/blob/master/code/ARAX/ARAXQuery/result_transformer.py

chunyuma commented 1 month ago

I think I understand now.

Currently, the Ranker hasn't considered the support graph yet because it might make the algorithm become more complicated. As you said, since NGD edges are not in the actual result graph, they are ignored. The current Ranker algorithm only considers the edges in the actual result graph. We currently don't have a way to assign a score based on the support graph.

@amykglen, please see how the Ranker calculate the final score for each result. It used all edge bindings in the analyses of message.results. I need a way to distinguish which edge_bindings will appear in the actual result graph and which ones will not. So, I can let the Ranker consider those in the actual result graph only. If the NGD scores are calculated and stored in the attribute of edges in the actual result graph, then these scores would be considered. Otherwise, these "virtual" edges might affect the current ranking algorithm. Does this make sense?

I can see that in the result_transformer.py script, the way to identify the XDTD support edges is to match the key creative_DTD_qedge_X. Is there a way to identify other edges that will not exist in the actual result graph?

chunyuma commented 1 month ago

Let's put this issue on the agenda of our next AHM meeting. I think we need the inputs from Eric or David to determine the final solution.

amykglen commented 1 month ago

Great, I think we're almost on the same page now - everything makes sense up to the point where you said "So, I can let the Ranker consider those in the actual result graph only." The problem with that is that NGD and other Overlay statistics measures are not in attributes on 'real' edges, they exist only on 'virtual' edges, which do not end up in the 'actual' result graph.

So what I'm proposing is that you identify all of the XDTD support edges by their creative_DTD_qedge_X naming pattern, exclude those such edges from the Ranker, but leave all the other edges (so you don't need to identify the other 'virtual' edges; you just leave them alone).

The only exception would be if XCRG has similar support edges to XDTD - if it does, then we may want to exclude those from the Ranker too

chunyuma commented 1 month ago

ok, this seems like a plan. Let me think of how to integrate these "virtual" edges (except for xDTD and xCRG) into the ranking algorithm. I think only the "virtual" edges with specific scores would help the ranking. I don't think xDTD and xCRG have such scores. So they can be excluded.

amykglen commented 1 month ago

cool. so if you leave the calling of the Ranker in the same place it has been, then the Ranker will automatically consider all the 'virtual' edges (since they are in the result graph at that point, since the ResultTransformer hasn't run yet). in other words, the Ranker has been considering NGD and other virtual edges for a long time. it's only if you were to move the Ranker - so that it's called after the virtual edges are tucked into support graphs - that it wouldn't be able to consider the NGD edges.

so I think what will need to happen is to leave the calling of the Ranker where it is, and just remove those XDTD/XCRG support graph edges from the result graphs that the Ranker sees (but not from the actual results). (for instance, you could choose to skip those edges when the results are loaded into networkx in the Ranker.)

I agree it might be worth discussing with the team - also would be helpful for that discussion to have some more examples of ranking performance when the XDTD/XCRG support graph edges are vs. are not included

chunyuma commented 1 month ago

Hi @saramsey, @dkoslicki, @edeutsch, @amykglen,

Please see if you agree with the following changes in the ranking algorithm based on the ranking results:

Example Query:

{
  "edges": {
    "t_edge": {
      "attribute_constraints": [],
      "knowledge_type": "inferred",
      "object": "ON",
      "predicates": [
        "biolink:treats"
      ],
      "qualifier_constraints": [],
      "subject": "SN"
    }
  },
  "nodes": {
    "ON": {
      "categories": [
        "biolink:Disease"
      ],
      "constraints": [],
      "ids": [
        "MONDO:0007186"
      ],
      "is_set": false,
      "set_interpretation": "BATCH"
    },
    "SN": {
      "categories": [
        "biolink:ChemicalEntity"
      ],
      "constraints": [],
      "is_set": false,
      "set_interpretation": "BATCH"
    }
  }
}

Our targets are pantoprazole (asset 621), esomeprazole (asset 620), nizatidine (asset 615), lansoprazole (asset 619), rabeprazole (asset 623).

Top 60 results before changes:

[1, 'cisapride'] [2, 'metoclopramide'] [3, 'famotidine'] [4, 'domperidone'] [5, 'baclofen zwitterion'] [6, 'cimetidine'] [7, 'bethanechol'] [8, 'dexlansoprazole'] [9, 'citric acid'] [10, 'misoprostol'] [11, 'aluminum hydroxide'] [12, 'potassium bicarbonate'] [13, 'isosorbide dinitrate'] [14, 'budesonide'] [15, 'magnesium carbonate'] [16, 'aluminum magnesium silicate'] [17, 'bleomycin a2'] [18, 'isosorbide'] [19, 'dopamine'] [20, '9-cis-retinoic acid'] [21, 'docetaxel anhydrous'] [22, 'n,n-dimethylethanolamine'] [23, 'interferon gamma-1b'] [24, 'phendimetrazine'] [25, 'sodium feredetate'] [26, 'proquazone'] [27, 'roflumilast'] [28, 'magnesium lactate'] [29, 'interferon alfa-2a'] [30, 'argipressin'] [31, 'fish oil'] [32, 'pipemidic acid'] [33, 'lithium orotate'] [34, 'human cytomegalovirus immune globulin'] [35, 'enoxacin'] [36, 'aminosalicylic sodium'] [37, 'bupranolol'] [38, 'iloprost'] [39, 'capecitabine'] [40, 'sisomycin'] [41, 'iotalamic acid'] [42, 'oxprenolol hydrochloride'] [43, 'factor viia'] [44, 'ferumoxides'] [45, 'demeclocycline'] [46, '(s)-amphetamine'] [47, 'diethylpropion'] [48, 'lisdexamfetamine'] [49, '(r,r)-labetalol'] [50, 'testosterone cypionate'] [51, 'chembl1201055'] [52, 'interferon alfacon-1'] [53, '5-fluorouracil'] [54, 'reserpine'] [55, '5-formyltetrahydrofolic acid'] [56, 'triamterene'] [57, 'esomeprazole'] [58, 'pantoprazole'] [59, 'lansoprazole'] [60, 'rabeprazole']

Top 60 results after implementing the first change:

[1, 'esomeprazole'] [2, 'pantoprazole'] [3, 'lansoprazole'] [4, 'rabeprazole'] [5, 'cisapride'] [6, 'ranitidine'] [7, 'metoclopramide'] [8, 'famotidine'] [9, 'baclofen zwitterion'] [10, 'cimetidine'] [11, 'domperidone'] [12, 'vonoprazan'] [13, 'acetylsalicylic acid'] [14, 'alginic acid'] [15, 'carafate'] [16, 'nizatidine'] [17, 'bethanechol'] [18, 'dexlansoprazole'] [19, '4-amino-5-chloro-2-ethoxy-n-({4-[(4-fluorophenyl)methyl]morpholin-2-yl}methyl)benzamide'] [20, 'proton pump inhibitors'] [21, 'erythromycin ethylsuccinate'] [22, 'tegoprazan'] [23, '(2s)-3-hydroxy-2-phenylpropanoic acid [(5r)-8-methyl-8-azabicyclo[3.2.1]octan-3-yl] ester'] [24, 'sodium bicarbonate'] [25, 'antacids'] [26, 'n-[[4-[2-(dimethylamino)ethoxy]phenyl]methyl]-3,4-dimethoxybenzamide'] [27, 'citric acid'] [28, 'technetium atom'] [29, 'misoprostol'] [30, 'theophylline'] [31, 'prostaglandin e2'] [32, 'morphine'] [33, 'prucalopride'] [34, 'cinitapride'] [35, 'rebamipide'] [36, 'roxane'] [37, 'fluticasone'] [38, 'aluminum hydroxide'] [39, 'magnesium hydroxide'] [40, 'diclofenac'] [41, '4-amino-5-bromo-n-[2-(diethylamino)ethyl]-2-methoxybenzamide'] [42, 'nitroglycerin'] [43, 'caffeine'] [44, 'carbenoxolone'] [45, 'potassium bicarbonate'] [46, 'pirenzepine'] [47, 'progesterone'] [48, 'citalopram'] [49, 'lsm-1330'] [50, 'imipramine'] [51, 'prednisone'] [52, 'ketamine'] [53, 'risedronic acid'] [54, 'clonazepam'] [55, 'gabapentin'] [56, 'diltiazem'] [57, 'quinine'] [58, 'midodrine'] [59, 'delta(9)-tetrahydrocannabinol'] [60, 'isosorbide dinitrate']

Top 60 results after implementing two change:

[1, 'esomeprazole'] [2, 'pantoprazole'] [3, 'lansoprazole'] [4, 'rabeprazole'] [5, 'cisapride'] [6, 'ranitidine'] [7, 'metoclopramide'] [8, 'famotidine'] [9, 'cimetidine'] [10, 'nizatidine'] [11, 'dexlansoprazole'] [12, 'baclofen zwitterion'] [13, 'dexrabeprazole'] [14, 'citric acid'] [15, 'aluminum hydroxide'] [16, 'domperidone'] [17, 'acetylsalicylic acid'] [18, 'vonoprazan'] [19, 'bethanechol'] [20, 'carafate'] [21, 'silicic acid'] [22, 'ilaprazole'] [23, 'roxane'] [24, 'potassium bicarbonate'] [25, 'alginic acid'] [26, 'magnesium hydroxide'] [27, 'proton pump inhibitors'] [28, 'magnesium carbonate'] [29, 'erythromycin ethylsuccinate'] [30, '4-amino-5-chloro-2-ethoxy-n-({4-[(4-fluorophenyl)methyl]morpholin-2-yl}methyl)benzamide'] [31, 'aluminum magnesium silicate'] [32, 'tegoprazan'] [33, 'magnesium orthosilicate'] [34, 'antacids'] [35, 'aluminum sulfate'] [36, 'carbaldrate'] [37, 'aluminum carbonate'] [38, 'sodium bicarbonate'] [39, 'misoprostol'] [40, 'n-[[4-[2-(dimethylamino)ethoxy]phenyl]methyl]-3,4-dimethoxybenzamide'] [41, 'technetium atom'] [42, 'pe-nme2(14:1(9z)/22:2(13z,16z))'] [43, 'nitroglycerin'] [44, 'prucalopride'] [45, '(2s)-3-hydroxy-2-phenylpropanoic acid [(5r)-8-methyl-8-azabicyclo[3.2.1]octan-3-yl] ester'] [46, 'cinitapride'] [47, 'rebamipide'] [48, 'progesterone'] [49, 'theophylline'] [50, '4-amino-5-bromo-n-[2-(diethylamino)ethyl]-2-methoxybenzamide'] [51, 'citalopram'] [52, 'carbenoxolone'] [53, 'pirenzepine'] [54, 'pantoprazole sodium'] [55, '3,4,5-trimethoxy-n-[(s)-3-piperidinyl]benzamide'] [56, 'bismuth subcitrate'] [57, '(s)-mosapride'] [58, 'esomeprazole sodium'] [59, '(r)-(+)-pantoprazole'] [60, 'tritec']

amykglen commented 1 month ago

awesome!

dkoslicki commented 1 month ago

Wow, that made quite the change! Nicely done!

chunyuma commented 1 month ago

I have further tested the updated algorithm and it has passed all test cases in test_ARAX_ranker except for test13 asset355.

By investigating the reason, I found that I previously set a very high confidence score to the edge from manual agent, which causes many manual agent results rank higher than the targeted result Monopril. See examples below:

Screenshot 2024-09-17 at 2 59 56 PM

Screenshot 2024-09-17 at 2 59 28 PM

@saramsey @edeutsch, do you think we can decrease the confidence score for manual agent edges?

edeutsch commented 1 month ago

Well, I still think that one solid manual_agent edge should still beat out several SemMedDb/Text mined/inferred edges. So in the above example, if one of the two edges in the first screenshot is "gold", and all 7 of the edges in the lower screenshot are "bronze", then I think the first one should win. i.e., I think each piece of high quality evidence should be weighted a lot higher than many lower quality pieces of evidence. "quality over quantity" should be our approach I think.

But having said that, 0.999 does seem a bit high. Maybe notching it down a bit would be helpful? 0.99? 0.98?

If we had higher confidence that our tests were very accurate, I would suggest that we could use an ML approach to learn what the best weights are to perform best on the test! But I don't think our tests are so great yet..

chunyuma commented 1 month ago

Thanks @edeutsch. Just curious how we define an edge is manual_agent?

"quality over quantity" should be our approach I think.

This makes sense. If we believe the manual_agent edge is "gold", giving it a higher confidence definitely makes sense. But 0.999 might be too high. I have set it to 0.99.

Now all ARAX ranker tests passes.

The Update Ranking Results for Test Case 14

(Targets: pantoprazole, esomeprazole, nizatidine, lansoprazole, rabeprazole) [1, 'esomeprazole'] [2, 'pantoprazole'] [3, 'lansoprazole'] [4, 'rabeprazole'] [5, 'metoclopramide'] [6, 'famotidine'] [7, 'cimetidine'] [8, 'dexlansoprazole'] [9, 'ranitidine'] [10, 'cisapride'] [11, 'nizatidine'] [12, 'baclofen zwitterion'] [13, 'citric acid'] [14, 'domperidone'] [15, 'aluminum hydroxide'] [16, 'acetylsalicylic acid'] [17, 'bethanechol'] [18, 'dexrabeprazole'] [19, 'carafate'] [20, 'vonoprazan'] [21, 'potassium bicarbonate'] [22, 'roxane'] [23, 'silicic acid'] [24, 'ilaprazole'] [25, 'magnesium hydroxide'] [26, 'alginic acid'] [27, 'erythromycin ethylsuccinate'] [28, 'magnesium carbonate'] [29, 'aluminum magnesium silicate'] [30, '4-amino-5-chloro-2-ethoxy-n-({4-[(4-fluorophenyl)methyl]morpholin-2-yl}methyl)benzamide'] [31, 'antacids'] [32, 'tegoprazan'] [33, 'aluminum sulfate'] [34, 'carbaldrate'] [35, 'aluminum carbonate'] [36, 'magnesium orthosilicate'] [37, '(2s)-3-hydroxy-2-phenylpropanoic acid [(5r)-8-methyl-8-azabicyclo[3.2.1]octan-3-yl] ester'] [38, 'proton pump inhibitors'] [39, 'sodium bicarbonate'] [40, 'misoprostol'] [41, 'theophylline'] [42, 'n-[[4-[2-(dimethylamino)ethoxy]phenyl]methyl]-3,4-dimethoxybenzamide'] [43, 'technetium atom'] [44, 'pe-nme2(14:1(9z)/22:2(13z,16z))'] [45, 'cinitapride'] [46, 'nitroglycerin'] [47, 'prucalopride'] [48, 'progesterone'] [49, 'rebamipide'] [50, 'citalopram'] [51, '4-amino-5-bromo-n-[2-(diethylamino)ethyl]-2-methoxybenzamide'] [52, 'carbenoxolone'] [53, 'pirenzepine'] [54, '3,4,5-trimethoxy-n-[(s)-3-piperidinyl]benzamide'] [55, '(r)-(+)-pantoprazole'] [56, 'esomeprazole magnesium'] [57, 'bismuth subcitrate'] [58, 'proglumide'] [59, 'milk protein'] [60, 'esomeprazole sodium'] [61, 'esomeprazole strontium'] [62, '3-(3,4-difluorophenyl)-4-(4-(methylsulfonyl)phenyl)-2(5h)-furanone'] [63, '2-[(r)-(4-methoxy-3-methylpyridin-2-yl)methylsulfinyl]-6-pyrrol-1-yl-1h-benzimidazole'] [64, '(s)-mosapride'] [65, 'sucralfate'] [66, '(4s,5r,6r)-2-[[(2r,3r,4s,5r,6r)-6-[(2s,3r,4r,5r,6r)-3-acetamido-2,5-dihydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3,4,5-trihydroxyoxan-2-yl]methoxy]-4-hydroxy-5-[(2-hydroxyacetyl)amino]-6-[(1r,2r)-1,2,3-trihydroxypropyl]oxane-2-carboxylic acid'] [67, 'tritec'] [68, 'pantoprazole sodium'] [69, 'zolimidine'] [70, '2-acetyloxybenzoic acid [3-(nitrooxymethyl)phenyl] ester'] [71, 'gastrocote'] [72, 'rennie'] [73, 'magnesium hydroxide'] [74, 'sodium alginate'] [75, "(1e)-n-{2-[({5-[(dimethylamino)methyl]furan-2-yl}methyl)sulfanyl]ethyl}-n'-methylnitroethanimidamide"] [76, '(6s,7s,8s)-8-(hydroxymethyl)-7-[4-[2-(3-methoxyphenyl)ethynyl]phenyl]-2-oxo-n-propyl-1,4-diazabicyclo[4.2.0]octane-4-carboxamide'] [77, 'prostaglandin e2'] [78, 'morphine'] [79, 'fluticasone'] [80, 'quinine'] [81, 'isosorbide dinitrate'] [82, 'diclofenac'] [83, 'isosorbide'] [84, 'caffeine'] [85, 'lsm-1330'] [86, 'imipramine'] [87, 'risedronic acid'] [88, 'budesonide'] [89, 'prednisone'] [90, 'ketamine'] [91, 'clonazepam'] [92, 'bleomycin a2'] [93, 'midodrine'] [94, 'dicyclomine'] [95, 'gabapentin'] [96, 'cyclobenzaprine'] [97, '9-cis-retinoic acid'] [98, 'diltiazem'] [99, 'hydromorphone'] [100, 'docetaxel anhydrous']

The Update Ranking Results for Test Case 13 (Previously Failed in the monopril above)

(Targets: benazepril, monopril, trandolapril, Moexipril) [1, 'enalapril'] [2, 'ramipril'] [3, 'lisinopril'] [4, 'captopril'] [5, 'cilazapril'] [6, 'perindopril'] [7, 'quinapril'] [8, 'omapatrilat'] [9, 'trandolapril'] [10, 'benazepril'] [11, 'imidapril'] [12, 'spirapril'] [13, 'losartan'] [14, 'quinaprilat'] [15, 'moexipril'] [16, 'zofenopril'] [17, 'ramiprilat'] [18, 'monopril'] [19, 'fozitec'] [20, 'temocapril'] [21, 'ethylenediaminetetraacetic acid'] [22, 'perindoprilat'] [23, 'delapril'] [24, 'thiorphan'] [25, 'mln-4760'] [26, '5-azabicyclo(11.3.1)heptadeca-1(17),13,15-triene-6-carboxylic acid, 3-(mercaptomethyl)-4-oxo-, (3s,6s)-'] [27, '(3s,6s)-3-mercaptomethyl-4-oxo-5-aza-bicyclo[10.3.1]hexadeca-1(15),12(16),13-triene-6-carboxylic acid'] [28, 'ethyl 2-[2-[(1-ethoxy-1-oxo-4-phenylbutan-2-yl)amino]propanoyl]-6,7-dimethoxy-3,4-dihydro-1h-isoquinoline-3-carboxylate'] [29, '2-[2-(1-ethoxycarbonyl-3-phenyl-propylamino)-propionyl]-1,2,3,4-tetrahydro-isoquinoline-3-carboxylic acid ethyl ester'] [30, '2-[2-[(1-carboxy-3-phenylpropyl)amino]propanoyl]-3,4-dihydro-1h-isoquinoline-1-carboxylic acid'] [31, 'moexiprilat hydrate'] [32, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(2-methylpropyl)carbamoyl]amino]-3-(1h-indol-3-yl)propanoic acid'] [33, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(2-methylpropyl)carbamoyl]amino]-3-naphthalen-1-ylpropanoic acid'] [34, "(alphas)-alpha-[[[(2s)-2-carboxy-2-hydroxyethylamino]carbonyl]amino][1,1'-biphenyl]-4-propanoic acid"] [35, '(2s)-2-[[2-methylpropyl(phosphonomethyl)carbamoyl]amino]-3-naphthalen-2-ylpropanoic acid'] [36, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(2-methylpropyl)carbamoyl]amino]-3-naphthalen-2-ylpropanoic acid'] [37, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(2-methylpropyl)carbamoyl]amino]-3-(4-hydroxyphenyl)propanoic acid'] [38, 'epi-ethoxycarbonyl quinapril'] [39, '2-[2-[(1-carboxy-3-phenylpropyl)amino]propanoyl]-1,3-dihydroisoindole-1-carboxylic acid'] [40, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-[(9,10-dioxoanthracen-2-yl)methyl]carbamoyl]amino]-3-naphthalen-2-ylpropanoic acid'] [41, '2-((s)-3-mercapto-2-methyl-propionyl)-2,3,4,9-tetrahydro-1h-beta-carboline-3-carboxylic acid'] [42, 'benzyl 2-[2-[(1-ethoxycarbonyl-3-phenyl-propyl)amino]propanoyl]-6,7-dimethoxy-3,4-dihydro-1h-isoquinoline-3-carboxylate'] [43, '(r,s)s 2-(3-acetylsulfanyl-2-methyl-propionyl)-1,2,3,4-tetrahydro-isoquinoline-3-carboxylic acid'] [44, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(9h-fluoren-2-ylmethyl)carbamoyl]amino]-3-naphthalen-2-ylpropanoic acid'] [45, '(3r,6s)-3-mercaptomethyl-4-oxo-5-aza-bicyclo[11.3.1]heptadeca-1(16),13(17),14-triene-6-carboxylic acid'] [46, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-[(9-oxofluoren-2-yl)methyl]carbamoyl]amino]-3-naphthalen-2-ylpropanoic acid'] [47, '(1s)-2-[(2s)-2-[[(2s)-1-ethoxy-1-oxo-4-phenylbutan-2-yl]amino]propanoyl]-1,3-dihydroisoindole-1-carboxylic acid'] [48, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(2-methylpropyl)carbamoyl]amino]-3-(3,4-dihydroxyphenyl)propanoic acid'] [49, '(2s)-2-[[[(2s)-2-carboxy-2-hydroxyethyl]-(2,2-dimethylpropyl)carbamoyl]amino]-3-naphthalen-2-ylpropanoic acid'] [50, 'enalaprilat (anhydrous)'] [51, 'trandolaprilat'] [52, '(3r,6s)-5-oxo-6-(sulfanylmethyl)-4-azabicyclo[11.4.0]heptadeca-1(17),13,15-triene-3-carboxylic acid'] [53, '2-(3-mercapto-2-methyl-propionyl)-1,2,3,4-tetrahydro-isoquinoline-3-carboxylic acid'] [54, '(3r,6s)-3-mercaptomethyl-4-oxo-5-aza-bicyclo[10.3.1]hexadeca-1(15),12(16),13-triene-6-carboxylic acid'] [55, '(3s,6s)-5-oxo-6-(sulfanylmethyl)-4-azabicyclo[11.4.0]heptadeca-1(17),13,15-triene-3-carboxylic acid'] [56, 'n~2~-acetyl-n-{(1r)-1-[(s)-(2s)-3-{[(2s)-1-amino-1-oxopropan-2-yl]amino}-2-methyl-3-oxopropylphosphoryl]-2-phenylethyl}-l-alpha-asparagine'] [57, 'imidaprilat'] [58, 'spiraprilat'] [59, '(2-mercaptomethyl-3-phenyl-propionyl)-glycine'] [60, 'alacepril'] [61, '1-benzyl-3-[2-(1-carboxy-2-phenyl-ethylamino)-propionyl]-2-oxo-imidazolidine-4-carboxylic acid'] [62, '[(s)-3-((s)-2-mercapto-3-phenyl-propionylamino)-2-oxo-azepan-1-yl]-acetic acid'] [63, '2-(1-carboxymethyl-2-oxo-2,3,4,5-tetrahydro-1h-benzo[b]azepin-3-ylamino)-hexanoic acid'] [64, '1-(5-benzoylamino-2-methyl-4-oxo-6-phenyl-hexanoyl)-pyrrolidine-2-carboxylic acid'] [65, '(r)-3-(benzylthio)-2-((s)-3-mercapto-2-methylpropanamido)propanoic acid'] [66, 'cbz-dl-ala-d-gglu-pro-oh'] [67, '(2s)-1-[(2r)-2-[(3-phenyl-2-sulfanylpropanoyl)amino]propanoyl]pyrrolidine-2-carboxylic acid'] [68, '2-[2-oxo-3-(sulfanylmethyl)cycloheptyl]propanoic acid'] [69, '1-benzyl-3-[2-(1-carboxy-3-methyl-butylamino)-propionyl]-2-oxo-imidazolidine-4-carboxylic acid'] [70, '1-[2-[(2-sulfanylcyclohexanecarbonyl)amino]propanoyl]pyrrolidine-2-carboxylic acid'] [71, '2-[(6s)-2-methyl-7-oxo-6-[[(2s)-3-phenyl-2-sulfanylpropanoyl]amino]diazepan-1-yl]acetic acid'] [72, '(2s,4r)-1-[(2s)-6-amino-2-[hydroxy(4-phenylbutyl)phosphoryl]oxyhexanoyl]-4-hydroxypyrrolidine-2-carboxylic acid'] [73, '(2s,4r)-1-[(2s)-6-amino-2-[hydroxy(4-phenylbutyl)phosphoryl]oxyhexanoyl]-4-phenylpyrrolidine-2-carboxylic acid'] [74, '3-[2-(1-carboxy-nonylamino)-butyryl]-1-methyl-2-oxo-imidazolidine-4-carboxylic acid'] [75, '[(1r)-2-(4-hydroxyphenyl)-1-[[(2s)-2-[[(2s)-3-methyl-2-(methylamino)butanoyl]amino]-3-phenylpropanoyl]amino]ethyl]phosphonic acid'] [76, '2-[(3s)-3-[[(2s)-3-cyclohexyl-2-sulfanylpropanoyl]amino]-2-oxo-4,5-dihydro-3h-1-benzazepin-1-yl]acetic acid'] [77, 'cbz-dl-lys-d-gglu-pro-oh'] [78, '2-(1-carboxymethyl-2-oxo-2,3,4,5-tetrahydro-1h-benzo[b]azepin-3-ylamino)-pentanoic acid'] [79, '6-tert-butoxycarbonylamino-2-(1-carboxymethyl-2-oxo-2,3,4,5-tetrahydro-1h-benzo[b]azepin-3-ylamino)-hexanoic acid'] [80, '(2s)-1-[2-[[(2s)-2-benzamido-4-phenylbutyl]-hydroxyphosphoryl]acetyl]pyrrolidine-2-carboxylic acid'] [81, '6h-pyridazino[1,2-a][1,2]diazepine-1-carboxylic acid, octahydro-9-[(2-mercapto-1-oxo-3-phenylpropyl)amino]-10-oxo-, [1s-[1alpha,9alpha(r*)]]-'] [82, '2-(1-carboxymethyl-2-oxo-azepan-3-ylamino)-4-phenyl-butyric acid'] [83, '2-(1-carboxymethyl-2-oxo-2,3,4,5-tetrahydro-1h-benzo[b]azepin-3-ylamino)-3-(1h-indol-3-yl)-propionic acid ethyl ester'] [84, '(2s)-1-[(2s)-2-[[1-[hydroxy(methoxy)phosphoryl]-3-phenylpropyl]amino]propanoyl]pyrrolidine-2-carboxylic acid'] [85, '2-[(3s)-3-[(2-methyl-3-phenyl-2-sulfanylpropanoyl)amino]-2-oxo-4,5-dihydro-3h-1-benzazepin-1-yl]acetic acid'] [86, '(2s,4r)-1-[(2s)-6-amino-2-[hydroxy(4-phenylbutyl)phosphoryl]oxyhexanoyl]-4-cyclohexylpyrrolidine-2-carboxylic acid'] [87, '(1z)-1-(1-benzoyl-2-oxoindol-3-ylidene)-3-[(3-hydroxynaphthalen-2-yl)methyl]thiourea'] [88, '(2s)-1-[(2s)-2-[[(1-benzoylpyrrolidin-2-yl)-carboxymethyl]amino]propanoyl]pyrrolidine-2-carboxylic acid'] [89, '(s)-1-[(s)-2-((s)-2-mercapto-3-phenyl-propionylamino)-propionyl]-pyrrolidine-2-carboxylic acid'] [90, '1-(3,4-dichlorophenyl)-3-[(2-hydroxybenzoyl)amino]thiourea'] [91, '(2s)-3-(1h-indol-3-yl)-2-[(2-sulfanylacetyl)amino]propanoic acid'] [92, '3-[(3s)-3-[[(2s)-2-benzyl-3-sulfanylpropanoyl]amino]-2-oxo-4,5-dihydro-3h-1-benzazepin-1-yl]propanoic acid'] [93, '(2s,4s)-1-[(2s)-6-amino-2-[hydroxy(4-phenylbutyl)phosphoryl]oxyhexanoyl]-4-methylsulfanylpyrrolidine-2-carboxylic acid'] [94, '(3s,5r)-1-[2-((s)-(s)-1-carboxy-3-phenyl-propylamino)-propionyl]-4-(1-mercaptomethyl-2-phenyl-ethylcarbamoyloxy)-pyrrolidine-2-carboxylic acid'] [95, '2-[n-(1-carboxy-3-phenylpropyl)-(s)-alanyl]octahydroisoindole-1(s)-carboxylic acid'] [96, '(3s,6s)-3-[(1-carboxy-3-phenylpropyl)amino]-4-oxo-2,3,6,7,8,9-hexahydro-1h-pyridazino[1,2-a]pyridazine-6-carboxylic acid'] [97, '(2s)-1-((2s)-2-((1-carboxy-4-((4-iodophenyl)amino)-4-oxobutyl)amino)propanoyl)pyrrolidine-2-carboxylic acid'] [98, 'n-(3-sulfanylpropanoyl)-l-phenylalanine'] [99, '(s,s)-2-[2-(1-ethoxycarbonyl-3-phenyl-propoxy)-propionyl]-1,2,3,4-tetrahydro-isoquinoline-3-carboxylic acid'] [100, '1-(3-mercapto-1-oxopropyl)-l-proline'] [101, '(2s)-1-[2-(1-carboxy-3-phenylpropyl)sulfanylpropanoyl]pyrrolidine-2-carboxylic acid']

edeutsch commented 1 month ago

Thanks @edeutsch. Just curious how we define an edge is manual_agent?

It is defined by our KPs. For RTX-KG2, the creators decide whether it is a manual_agent or something else based on their best guess of the knowledge source. I imagine it is not 100% perfect. Seme edges will be inadvertently mischaracterized.

Thus the weight factor (0.99) might be some combination of the false discovery rate combined with the chance that the edge agent type is mischaracterized.

chunyuma commented 1 month ago

Thanks @edeutsch and everyone's suggestions on this issue.

I think this issue has been resolved, and the ranking algorithm seems to be better now.

So, I close this issue.