biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

update BTE's version of biolink-model to 3.5.3 #661

Closed colleenXu closed 1 year ago

colleenXu commented 1 year ago

Biolink-model 3.5.0 should be coming out soon. Based on the deadline described below, we'll want BTE to use this.

Deadline (link to Translator google spreadsheet) in mid-July…

Fully semantically valid biolink 3.5.0 + TRAPI 1.4.0 from all actors in PROD

colleenXu commented 1 year ago

First step is for me to review the diff from 3.1.1 (what BTE currently uses) when it comes out (like I've done in the past). I'll see if any extra adjustments to BTE behavior or x-bte annotation are needed.

In particular, we'll want to know if qualifier-hierarchy-support is affected. There's a test recorded. @rjawesome can be responsible for this part.

colleenXu commented 1 year ago

Biolink 3.5.0 was released last Friday afternoon.

I'm using this link to compare biolink-model.yaml from 3.1.1 to 3.5.0.

Major, my to-dos

Info that's low-priority and for possible future tweaks of BTE behavior and x-bte annotation * <=3.1.1: biolink-model says node category is multi-valued and "should" include ancestor biolink classes (lines 512-513). Dunno if we actually want to change BTE's behavior though… semmeddb mappings (may affect x-bte notebook if I update it to use 3.5.0) * ADMINISTERED_TO mapped to "related to". Lines 1650-1651 (was previously mapped to affects) * STY:T130 / diagnostic aid mapped to new category "DiagnosticAid" (lines 6680-6686) * STY:T129 / immunologic factor / imft mapped to BiologicalEntity (line 6781). Was previously mapped to ChemicalEntity * STY:T045 / genetic function mapped to "BiologicalProcessOrActivity" (lines 7170-7171) and to PhysiologicalProcess?? (lines 7247-7248) * STY:T008 / animal mapped to OrganismalEntity (lines 7421) * STY:T007 / bacterium mapped to new category Bacterium (lines 7403-7410) * STY:T005 / virus mapped to Virus (line 7400-7401) * STY:T015 / mammal mapped to new category Mammal (lines 7431-7443) * STY:T016 / human mapped to new category Human (lines 7445-7455) * STY:T002 / plant mapped to new category Plant (lines 7457-7464) * STY:T011 / invertebrate mapped to new category Invertebrate (lines 7466-7476) * STY:T010 / vertebrate mapped to new category Vertebrate (lines 7478-7487) * STY:T004 / fungus mapped to new category Fungus (lines 7489-7502) * STY:T029 (body location or region), STY:T030 (body space or junction), STY:T031 (body substance) mapped to AnatomicalEntity. Lines 7715-7716, 7726

Noting but no action needed from us * a simple heading for `slots` (line 279), `classes` (line 5775), enums (line 10742) * comments on what to put in value / value_type / value_type_name for certain node/edge attributes? (lines 388-391, 401-404) * when reading biolink-model, not clear whether using "interacts with" is okay (<=3.1.1: Lines 2301-2303). Semmeddb predicate is mapped to this. [Wasn't fully discussed in a previous issue](https://github.com/biolink/biolink-model/issues/1171)… * comments more strongly state that exogenous/environmental chemicals affecting genes should not use the "regulates" predicate (lines 2655-2657) * added info on associations. Dunno if this will cause any issues with TRAPI/biolink-model validation * gene to disease / pheno association (lines 9833-9863) * causal gene to disease association (lines 9908-9922) and correlated gene to disease association (lines 9924-9938) * gene has variant that contributes to disease (lines 10264-10275)
New and deprecated terms * new categories: * "diagnostic aid" (lines 6680-6686) * regulatory region, accessible dna region, transcription factor binding site (lines 7019-7061) * Bacterium (lines 7403-7410) * Mammal (lines 7431-7443) * Human (lines 7445-7455) * Plant (lines 7457-7464) * Invertebrate (lines 7466-7476) * Vertebrate (lines 7478-7487) * Fungus (lines 7489-7502) * deprecated predicates: * binds (line 2356) * genetic association (line 3101) * has capability (line 4189) * new predicates: * more specific than "genetically interacts with": "gene_fusion_with" and "genetic_neighborhood_of" (lines 2390-2415) * genetically associated with (replaces "genetic association"). Lines 3106-3118 * can be carried out by (replaces "has capability"). line 4189 * new edge-attributes: * in taxon label (node property?) lines 4927 - 4939. human-readable scientific name * logs odd ratio, logs odd ratio 95 ci, total sample size (lines 5260-5278) * dataset count? Lines 5395-5401 * log odds analysis result (thing?) lines 6145-6148 * deprecated edge-attributes: * supporting documents (line 5497). previously used by text-mining? * new qualifier values: * under ChemicalOrGeneOrGeneProductFormOrVariantEnum (lines 10776-10780) * modified_form * loss_of_function_variant_form * gain_of_function_variant_form * different qualifier behavior: * degradation is no longer an "abundance"
Commentary / questions * Confusing (lines 673-704, 713-725) * why are these node properties: resource id, resource role, upstream resource ids * domain/range being "retrieval source" * do some qualifiers go into edge-attributes, or do all qualifiers go in the edge's qualifiers section? Starts line 1150, see older list [here](https://github.com/biothings/biothings_explorer/issues/514#issue-1403608877) * is "has chemical role" a predicate or a node property? Line 1193 * qualifier terms in examples and notes aren't in the enums, <=3.1.1 (lines 1280-1367). Mostly noted the first two (form or variant qualifier, aspect qualifier) * typo: "cerebral cortext" -> "cerebral cortex" lines 1513-1520 * typo: "myocardial infraction" -> "myocardial infarction" lines 1788, 2008 * typo? "Iff" and "fiat" in line 2344 * format change? Underscore vs space for predicates line 2390, 2405 * unclear what "mixins:" for a predicate or entity type / category means… * unclear what this line is for: line 3506 * node property added in predicates section? lines 4927-4939 * another domain confusion: lines 5180, 5193 * missing predicate specification? causal gene to disease association (lines 9908-9922) and correlated gene to disease association (lines 9924-9938) * typo? "reource" -> "resource" line 11118
colleenXu commented 1 year ago

Note that we'll want to update the SmartAPI yamls for Service Provider/BTE at some point to show we updated these tools to biolink-model 3.5.0...

EDIT: 7/12 done for BTE/Service Provider/Service-Provider-only KPs https://github.com/NCATS-Tangerine/translator-api-registry/commit/f740588000ed27eaa013722dfc81071bb4751420

colleenXu commented 1 year ago

@tokebe

I've finished reviewing biolink 3.5.0.

I didn't notice anything that required changing BTE behavior, so I think we can go ahead and update BTE's biolink-model module to use 3.5.0. It'll probably be simpler than it was here?

Regarding qualifier-hierarchy support, I think it'll continue working? I dunno if you want to test, since Rohan is out this week (lab's Slack link).

colleenXu commented 1 year ago

@rjawesome can you check with @tokebe on whether the qualifier-hierarchy support still needs testing?

@tokebe updated dev + ci to use 3.5.0 on Friday https://github.com/biothings/biolink-model.js/commit/abfb7b754251887677a90890d887a72a27c0cb85

rjawesome commented 1 year ago

It should be good as the qualifier hierarchy is already tested via the unit tests on biolink-model.js package (which Jackson updated for the new spec).

(Also, just checked and my test query for DGIdb from the issue is working as well)

colleenXu commented 1 year ago

Comparing 3.5.2 (released yesterday) to 3.5.0:

colleenXu commented 1 year ago

Comparing 3.5.3 (from last Wed) to 3.5.1 (since 3.5.2's release date is a bit confusing to me):

colleenXu commented 1 year ago

@tokebe

This is minor, but do you think we could update the version of biolink-model we use to 3.5.3 - for the Sept release?

I've done the only change we really needed to go from 3.5.0 -> 3.5.3 (adjusting how x-bte annotation is created for BioThings SEMMEDDB, so it uses the Translator-curated exclusions for SEMMEDDB). The other changes are very minor and summarized in my two previous posts.

I've also updated our tool's SmartAPI yamls to advertise 3.5.3, because this may help remove some TRAPI validation issues (by asking the TRAPI validator to use biolink 3.5.3 when validating our responses).

tokebe commented 1 year ago

Will do 👍