Open HadleyKing opened 2 weeks ago
Spoke with Hadley and we had some ideas for the representation of the scores in the data model. I've been implementing scores in the biomarker project for scoring "trustworthy" biomarkers and a few things that we've done that have made things easier to track are:
{
"score": 3.4,
"score_info": {
"contributions": [
{
"c": "first_pmid",
"w": 1,
"f": 1
},
{
"c": "other_pmid",
"w": 0.2,
"f": 7
},
{
"c": "first_source",
"w": 1,
"f": 1
},
{
"c": "other_source",
"w": 0.1,
"f": 0
},
{
"c": "generic_condition_pen",
"w": -4,
"f": 0
},
{
"c": "loinc",
"w": 1,
"f": 0
}
],
"formula": "sum(w*f)",
"variables": {
"w": "weight",
"c": "condition",
"f": "frequency"
}
}
}
This shows that the score was calculated by the sum of the weights times the frequencies. For example, having one PMID associated with the biomarker is a weight of 1. Additional PMIDs get a weight of 0.2, and so on. So the calculation for this score was 1(1) + 0.2(7) + 1(1)
.
Implement the ideas from #328 into the BCO Scoring function: https://github.com/biocompute-objects/bco_api/blob/456d00293e51f057b9d1755835e36e31881b9fe1/biocompute/services.py#L599-L621