SuLab / scheduled-bots

GeneWiki Scheduled Bots
MIT License
9 stars 15 forks source link

CIViC prognostic information not present in Wikidata #20

Closed floatingpurr closed 6 years ago

floatingpurr commented 6 years ago

Hello. I noticed that some CIViC variants have prognostic information (e.g., https://civicdb.org/events/genes/30/summary/variants/203/summary#variant) but such an information is not copied in Wikidata by ProteinBoxBot (e.g., Q28531489).

In Wikidata exist properties representing positive/negative prognostic predictors. Is there a problem with this kind of information or is it simply not yet managed by the bot?

Thanks.

stuppie commented 6 years ago

Need @andrawaag for confirmation, but I think we had made the decision to only import statements with an Evidence Level of "A" and a high trust rating? Although I'm not seeing if/where that is specified in the bot, so maybe not... I'll look into why its being skipped..

See also: https://github.com/SuLab/GeneWikiCentral/issues/88

stuppie commented 6 years ago

It looks like for this one, it has a "clinical_significance" of "Poor Outcome", and we are checking for either "Sensitivity" (== "positive prognostic predictor") or "Resistance or Non-Response" (== "negative prognostic predictor"). I'm guessing "Poor outcome" should also mean "negative prognostic predictor" ?

floatingpurr commented 6 years ago

Yes, it should be right: "Poor outcome" should mean "negative prognostic predictor" . Also Wikidata description of positive (negative) prognostic predictor items says

the presence of the genetic variant helps to prognose good (poor) outcome for the disease

But if the bot manages just predictive information (i.e., the first main big row in this table), it's normal not to see prognosis annotations in Wikidata and that could be a full explanation also for SuLab/GeneWikiCentral#88.

PS: we could double check CIViC data, but I don't think the reason can be the rating or the evidence level. Indeed, there are some 5-star prognosis evidences with level A.

stuppie commented 6 years ago

For sake of completeness, here are the counts for evidence coming from the API

{
'Diagnostic': {'Negative': 9, 'Positive': 111},
'Predictive': {
          'Adverse Response': 9,   # not in table
          'Reduced Sensitivity': 2,  # not in table
          'Resistance': 544,
          'N/A': 12,
          None: 3,
          'Sensitivity/Response': 1077},
'Predisposing': {
          'Likely Pathogenic': 11,
          None: 1,
          'N/A': 2,
          'Pathogenic': 9,
          'Positive': 14, # not under "Predisposing"
          'Uncertain Significance': 467},
'Prognostic': {
          'Better Outcome': 99,  # not in table (maybe this should be "Good Outcome")
          'N/A': 45,
          'Negative': 1,  # not under "Prognostic"
          'Poor Outcome': 302,
          'Positive': 3 # not under "Prognostic"
          }
}

evidence_direction: {None: 1, 'Does Not Support': 258, 'Supports': 2462}

Everything matches the table except 'Better Outcome', 'Reduced Sensitivity', and 'Adverse Response'. And some terms are used under the wrong Evidence Type.

I'll check on what the bot is checking for in a minute...

stuppie commented 6 years ago

The bot is looking for clinical significance of "Sensitivity" or "Resistance or Non-Response", for each of Diagnostic, Prognostic and Predictive, which don't exist in the data anymore. These values must have changed at some point. I'll fix the bot, and add in checks for unknown values so it'll throw errors if they change.. And make an issue for the values that are different from the table. Thanks

floatingpurr commented 6 years ago

As always, thank you guys too for your effort. :wink:

stuppie commented 6 years ago

Alright, updated bot is running, there's prognostic information in now. Thanks for pointing this out and helping troubleshoot