biocompute-objects / bco_api

The master repository for the JSON API.
MIT License
2 stars 1 forks source link

Implementation of BETA BCO Ranking systems #329

Open HadleyKing opened 2 weeks ago

HadleyKing commented 2 weeks ago

Implement the ideas from #328 into the BCO Scoring function: https://github.com/biocompute-objects/bco_api/blob/456d00293e51f057b9d1755835e36e31881b9fe1/biocompute/services.py#L599-L621

Kirans0615 commented 2 weeks ago

BCO Scoring System New.zip

seankim658 commented 2 weeks ago

Spoke with Hadley and we had some ideas for the representation of the scores in the data model. I've been implementing scores in the biomarker project for scoring "trustworthy" biomarkers and a few things that we've done that have made things easier to track are:

  1. Have some sort of internal versioning for the scores. The scoring is an iterative process that changes over time and as it changes having some sort of way to delineate which scores come from which version formula is very helpful.
  2. When calculating the scores, create an object with the formula breakdown. This can be used both internally when investigating scores and on the frontend to show users how the score was calculated/where the weights are coming from. The biomarker project has that information in our data schema and returns it on API requests. It looks like this:
    {
    "score": 3.4,
    "score_info": {
      "contributions": [
        {
          "c": "first_pmid",
          "w": 1,
          "f": 1
        },
        {
          "c": "other_pmid",
          "w": 0.2,
          "f": 7
        },
        {
          "c": "first_source",
          "w": 1,
          "f": 1
        },
        {
          "c": "other_source",
          "w": 0.1,
          "f": 0
        },
        {
          "c": "generic_condition_pen",
          "w": -4,
          "f": 0
        },
        {
          "c": "loinc",
          "w": 1,
          "f": 0
        }
      ],
      "formula": "sum(w*f)",
      "variables": {
        "w": "weight",
        "c": "condition",
        "f": "frequency"
      }
    }
    }

    This shows that the score was calculated by the sum of the weights times the frequencies. For example, having one PMID associated with the biomarker is a weight of 1. Additional PMIDs get a weight of 0.2, and so on. So the calculation for this score was 1(1) + 0.2(7) + 1(1).