Closed sierra-moxon closed 1 year ago
@sstemann - @cbizon said you might have PKs that came up that might display this error state from the standup call this week?
Specifically @sstemann - I think that the PK that was leading to the "white page" UI should be one that has this problem in it (somewhere)
Hi folks, it's odd that this is not already being trapped as an error sincere there is already some code for this:
if not attribute['value']:
self.report(
code="error.knowledge_graph.edge.attribute.value.empty",
identifier=edge_id
)
however, if I read the issue description correctly here, even though a JSON null is probably already caught by this code (i.e. as a None
value), there is also seems to be the possibility of strings being returned that say "None" or "N/A" or even "null". I can add this in.
Actually, I'm not even sure that strings with those values - "None" or "N/A" or even "null" - would get past the TRAPI schema validation gauntlet (before hitting the Biolink Model validation step) but all the same, I'll test for them.
Reopening this issue to review "N/A" inclusion (cc: @CaseyTa, @sierra-moxon)
@sierra-moxon, @RichardBruskiewich recently implemented the requested changes in the reasoner validator, but this is now throwing some possibly false-positive errors for COHD's edges. COHD uses a 2-level AnalysisResult structure for representing some of its attributes. Example:
{
"attribute_type_id": "biolink:has_supporting_study_result",
"value": "N/A",
"value_type_id": "biolink:ChiSquaredAnalysisResult",
"attribute_source": "infores:cohd",
"value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP",
"description": "A study result describing a chi-squared analysis on a single pair of concepts",
"attributes": [
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:unadjusted_p_value",
"attributes": null,
"description": "Chi-square p-value, unadjusted.",
"original_attribute_name": "p-value",
"value": 1e-12,
"value_type_id": "EDAM:data_1669",
"value_url": "http://edamontology.org/data_1669"
},
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:bonferonni_adjusted_p_value",
"attributes": null,
"description": "Chi-square p-value, Bonferonni adjusted by number of pairs of concepts.",
"original_attribute_name": "p-value adjusted",
"value": 1e-12,
"value_type_id": "EDAM:data_1669",
"value_url": "http://edamontology.org/data_1669"
},
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:supporting_data_set",
"attributes": null,
"description": "Dataset ID within COHD",
"original_attribute_name": "dataset_id",
"value": "COHD:dataset_3",
"value_type_id": "EDAM:data_1048",
"value_url": null
}
]
}
In the above example, we see that the top level biolink:ChiSquaredAnalysisResult
attribute doesn't natively have a value, but captures its information with sub-attributes. Since value
is required, I've been getting around this by using the string value of "N/A"
in the top level value
. But now, COHD's edges are getting errors for error.knowledge_graph.edge.attribute.value.empty
.
@sierra-moxon, is there some value that I should be placing into the top level AnalysisResult attributes? Perhaps should we be allowing empty attribute values when the attribute has sub-attributes? Please let me know if you have any other recommendations. Thanks!
I'm not clear if this is open or closed issue; we should look towards the TMKP's implementation of sub-attributes for guidance if you haven't already solved this? e.g.: https://github.com/NCATSTranslator/Evidence-Provenance-Confidence-Working-Group/blob/930afde43aa4a7424d3586496511005da5ad193e/attribute_epc_examples/TMKP_TRAPI1.2_nested_attributes.yml#L35
(I also saw the comment there in the linked line above from @mbrush about adding a class to Biolink - did that happen @mbrush?)
FWIW, we do have biolink:StudyResult
as a class/category in biolink
@sierra-moxon, I didn't want to necessarily re-open the issue since I'm not sure if this needs to be addressed by the reasoner validator or if it needs to be addressed by COHD, but I just wanted to discuss it here since this was the relevant issue that triggered the change in behavior.
COHD doesn't have identifiers like in TMKP's example that can be placed as the value. I don't think we have much that can be meaningfully placed as the value. We could just fill it with another string, like "Chi-Squared Analysis Result", if we want to maintain the behavior that "N/A" should trigger an error.
I'm thinking instead of this:
{
"attribute_type_id": "biolink:has_supporting_study_result",
"value": "N/A",
"value_type_id": "biolink:ChiSquaredAnalysisResult",
"attribute_source": "infores:cohd",
"value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP",
"description": "A study result describing a chi-squared analysis on a single pair of concepts",
"attributes": [
do this:
{
"attribute_type_id": "biolink:has_supporting_study_result",
"value": "biolink:ChiSquaredAnalysisResult",
"value_type_id": "linkml:curie",
"attribute_source": "infores:cohd",
"value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP",
"description": "A study result describing a chi-squared analysis on a single pair of concepts",
"attributes": [
What is the supporting study result? It is a Chi-squared analysis result. And that concept is expressed as a CURIE. The sub-attributes provide the relevant content of the result.
Thanks all. I saw in the TKMP examples that another possibility is to put a description of the result, so now I'll just fill in the value as a summary of the result.
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:has_supporting_study_result",
"attributes": [
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:log_odds_ratio",
"description": "Observed-expected frequency ratio.",
"original_attribute_name": "log_odds",
"value": 4.224409683758724,
"value_type_id": "EDAM:data_1772"
},
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:log_odds_ratio_95_ci",
"description": "Log-odds 95% confidence interval",
"original_attribute_name": "log_odds_ci",
"value": [
3.9974930192855243,
4.451326348231923
],
"value_type_id": "EDAM:data_0951"
},
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:total_sample_size",
"description": "Observed concept count between the pair of subject and object nodes",
"original_attribute_name": "concept_pair_count",
"value": 349,
"value_type_id": "EDAM:data_0006"
},
{
"attribute_source": "infores:cohd",
"attribute_type_id": "biolink:supporting_data_set",
"description": "Dataset ID within COHD",
"original_attribute_name": "dataset_id",
"value": "COHD:dataset_3",
"value_type_id": "EDAM:data_1048"
}
],
"description": "A study result describing a log-odds anaylsis on a single pair of concepts",
"value": "4.224 [3.997, 4.451]",
"value_type_id": "biolink:LogOddsAnalysisResult",
"value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP"
}
So we can keep this issue closed and keep the validator's behavior as-is. Thanks again, everyone!
Thanks everyone for your input here. I'm happy that the issue was resolved.
As the TRAPI model indicates, the attribute_type_id and value are both mandatory. I briefly pondered yesterday whether or not "N/A" is a "valid" value for an attribute - I mean, semantically useful for the end user of the knowledge. Perhaps still something to ponder (referring this thought to @cbizon @sierra-moxon and @edeutsch )
I think an attribute should always have a value. Its whole purpose is to convey a value and describe what that value means.
Is your feature request related to a problem? Please describe. From the Translator stand-up calls, we see TRAPI Attribute objects coming back with null (or None)
value
components.Describe the solution you'd like Since
value
is a required field of theAttribute
object, the validator should return an error if it sees "None" or "N/A" ornull
.