NCATSTranslator / reasoner-validator

Validation of Translator OpenAPI (TRAPI) messages both to TRAPI and Biolink Model standards. See https://ncatstranslator.github.io/reasoner-validator/
Other
2 stars 4 forks source link

If a TRAPI Attribute is received with "value: None" (or null) this should trigger a validation error #72

Closed sierra-moxon closed 1 year ago

sierra-moxon commented 1 year ago

Is your feature request related to a problem? Please describe. From the Translator stand-up calls, we see TRAPI Attribute objects coming back with null (or None) value components.

Describe the solution you'd like Since value is a required field of the Attribute object, the validator should return an error if it sees "None" or "N/A" or null.

sierra-moxon commented 1 year ago

@sstemann - @cbizon said you might have PKs that came up that might display this error state from the standup call this week?

cbizon commented 1 year ago

Specifically @sstemann - I think that the PK that was leading to the "white page" UI should be one that has this problem in it (somewhere)

RichardBruskiewich commented 1 year ago

Hi folks, it's odd that this is not already being trapped as an error sincere there is already some code for this:

if not attribute['value']:
                    self.report(
                        code="error.knowledge_graph.edge.attribute.value.empty",
                        identifier=edge_id
                    )

however, if I read the issue description correctly here, even though a JSON null is probably already caught by this code (i.e. as a None value), there is also seems to be the possibility of strings being returned that say "None" or "N/A" or even "null". I can add this in.

RichardBruskiewich commented 1 year ago

Actually, I'm not even sure that strings with those values - "None" or "N/A" or even "null" - would get past the TRAPI schema validation gauntlet (before hitting the Biolink Model validation step) but all the same, I'll test for them.

RichardBruskiewich commented 1 year ago

Resolved by PR https://github.com/NCATSTranslator/reasoner-validator/pull/73

RichardBruskiewich commented 1 year ago

Reopening this issue to review "N/A" inclusion (cc: @CaseyTa, @sierra-moxon)

CaseyTa commented 1 year ago

@sierra-moxon, @RichardBruskiewich recently implemented the requested changes in the reasoner validator, but this is now throwing some possibly false-positive errors for COHD's edges. COHD uses a 2-level AnalysisResult structure for representing some of its attributes. Example:

{
              "attribute_type_id": "biolink:has_supporting_study_result",
              "value": "N/A",
              "value_type_id": "biolink:ChiSquaredAnalysisResult",
              "attribute_source": "infores:cohd",
              "value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP",
              "description": "A study result describing a chi-squared analysis on a single pair of concepts",
              "attributes": [
                {
                  "attribute_source": "infores:cohd",
                  "attribute_type_id": "biolink:unadjusted_p_value",
                  "attributes": null,
                  "description": "Chi-square p-value, unadjusted.",
                  "original_attribute_name": "p-value",
                  "value": 1e-12,
                  "value_type_id": "EDAM:data_1669",
                  "value_url": "http://edamontology.org/data_1669"
                },
                {
                  "attribute_source": "infores:cohd",
                  "attribute_type_id": "biolink:bonferonni_adjusted_p_value",
                  "attributes": null,
                  "description": "Chi-square p-value, Bonferonni adjusted by number of pairs of concepts.",
                  "original_attribute_name": "p-value adjusted",
                  "value": 1e-12,
                  "value_type_id": "EDAM:data_1669",
                  "value_url": "http://edamontology.org/data_1669"
                },
                {
                  "attribute_source": "infores:cohd",
                  "attribute_type_id": "biolink:supporting_data_set",
                  "attributes": null,
                  "description": "Dataset ID within COHD",
                  "original_attribute_name": "dataset_id",
                  "value": "COHD:dataset_3",
                  "value_type_id": "EDAM:data_1048",
                  "value_url": null
                }
              ]
            }

In the above example, we see that the top level biolink:ChiSquaredAnalysisResult attribute doesn't natively have a value, but captures its information with sub-attributes. Since value is required, I've been getting around this by using the string value of "N/A" in the top level value. But now, COHD's edges are getting errors for error.knowledge_graph.edge.attribute.value.empty.

@sierra-moxon, is there some value that I should be placing into the top level AnalysisResult attributes? Perhaps should we be allowing empty attribute values when the attribute has sub-attributes? Please let me know if you have any other recommendations. Thanks!

sierra-moxon commented 1 year ago

I'm not clear if this is open or closed issue; we should look towards the TMKP's implementation of sub-attributes for guidance if you haven't already solved this? e.g.: https://github.com/NCATSTranslator/Evidence-Provenance-Confidence-Working-Group/blob/930afde43aa4a7424d3586496511005da5ad193e/attribute_epc_examples/TMKP_TRAPI1.2_nested_attributes.yml#L35

(I also saw the comment there in the linked line above from @mbrush about adding a class to Biolink - did that happen @mbrush?)

sierra-moxon commented 1 year ago

FWIW, we do have biolink:StudyResult as a class/category in biolink

CaseyTa commented 1 year ago

@sierra-moxon, I didn't want to necessarily re-open the issue since I'm not sure if this needs to be addressed by the reasoner validator or if it needs to be addressed by COHD, but I just wanted to discuss it here since this was the relevant issue that triggered the change in behavior.

COHD doesn't have identifiers like in TMKP's example that can be placed as the value. I don't think we have much that can be meaningfully placed as the value. We could just fill it with another string, like "Chi-Squared Analysis Result", if we want to maintain the behavior that "N/A" should trigger an error.

edeutsch commented 1 year ago

I'm thinking instead of this:

{
              "attribute_type_id": "biolink:has_supporting_study_result",
              "value": "N/A",
              "value_type_id": "biolink:ChiSquaredAnalysisResult",
              "attribute_source": "infores:cohd",
              "value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP",
              "description": "A study result describing a chi-squared analysis on a single pair of concepts",
              "attributes": [

do this:

{
              "attribute_type_id": "biolink:has_supporting_study_result",
              "value": "biolink:ChiSquaredAnalysisResult",
              "value_type_id": "linkml:curie",
              "attribute_source": "infores:cohd",
              "value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP",
              "description": "A study result describing a chi-squared analysis on a single pair of concepts",
              "attributes": [

What is the supporting study result? It is a Chi-squared analysis result. And that concept is expressed as a CURIE. The sub-attributes provide the relevant content of the result.

CaseyTa commented 1 year ago

Thanks all. I saw in the TKMP examples that another possibility is to put a description of the result, so now I'll just fill in the value as a summary of the result.

                        {
                            "attribute_source": "infores:cohd",
                            "attribute_type_id": "biolink:has_supporting_study_result",
                            "attributes": [
                                {
                                    "attribute_source": "infores:cohd",
                                    "attribute_type_id": "biolink:log_odds_ratio",
                                    "description": "Observed-expected frequency ratio.",
                                    "original_attribute_name": "log_odds",
                                    "value": 4.224409683758724,
                                    "value_type_id": "EDAM:data_1772"
                                },
                                {
                                    "attribute_source": "infores:cohd",
                                    "attribute_type_id": "biolink:log_odds_ratio_95_ci",
                                    "description": "Log-odds 95% confidence interval",
                                    "original_attribute_name": "log_odds_ci",
                                    "value": [
                                        3.9974930192855243,
                                        4.451326348231923
                                    ],
                                    "value_type_id": "EDAM:data_0951"
                                },
                                {
                                    "attribute_source": "infores:cohd",
                                    "attribute_type_id": "biolink:total_sample_size",
                                    "description": "Observed concept count between the pair of subject and object nodes",
                                    "original_attribute_name": "concept_pair_count",
                                    "value": 349,
                                    "value_type_id": "EDAM:data_0006"
                                },
                                {
                                    "attribute_source": "infores:cohd",
                                    "attribute_type_id": "biolink:supporting_data_set",
                                    "description": "Dataset ID within COHD",
                                    "original_attribute_name": "dataset_id",
                                    "value": "COHD:dataset_3",
                                    "value_type_id": "EDAM:data_1048"
                                }
                            ],
                            "description": "A study result describing a log-odds anaylsis on a single pair of concepts",
                            "value": "4.224 [3.997, 4.451]",
                            "value_type_id": "biolink:LogOddsAnalysisResult",
                            "value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/COHD-KP"
                        }

So we can keep this issue closed and keep the validator's behavior as-is. Thanks again, everyone!

RichardBruskiewich commented 1 year ago

Thanks everyone for your input here. I'm happy that the issue was resolved.

As the TRAPI model indicates, the attribute_type_id and value are both mandatory. I briefly pondered yesterday whether or not "N/A" is a "valid" value for an attribute - I mean, semantically useful for the end user of the knowledge. Perhaps still something to ponder (referring this thought to @cbizon @sierra-moxon and @edeutsch )

edeutsch commented 1 year ago

I think an attribute should always have a value. Its whole purpose is to convey a value and describe what that value means.