clinical-biomarkers / biomarker-partnership

CFDE Biomarker Partnership
https://hivelab.biochemistry.gwu.edu/biomarker-partnership
1 stars 0 forks source link

Biomarker Scoring #131

Open DaniallMasood opened 2 months ago

DaniallMasood commented 2 months ago

How should biomarker scores. Begin to look into algorithm based on publications, FDA approval, EHR, etc

Look at example in IDG. Discuss with Jeremy

DaniallMasood commented 2 months ago

Add text for scoring on biomarker page somewhere

One pmid = 1 More than 1 then 0.5 Generic cancer is -1 +1 if it comes from more than one source Each additional resource is 0.1 +1 if there is a loinc code

seankim658 commented 2 months ago

Somewhat concerned about the scaling of the PMID scoring criteria. From first glance on the Clinvar and Gwas data it looks like the biomarkers related to generic cancer generally have way more PubMed evidence associated with them. So even if we dock general cancer biomarkers by 1 point they will still score much higher than specific condition biomarkers in this scoring system.

One example is this biomarker: https://hivelab.biochemistry.gwu.edu/biomarker/api/id/AA5047-1. It is from Clinvar and has 80 PubMed papers associated with it. This would be among the highest scoring biomarkers even with a -1 deduction for the generic cancer qualifier.

Will have to think about either a much harsher penalty for generic cancer records or some kind of scaling punishment dependent on PMID count.

seankim658 commented 2 months ago

Another idea is to add a hard cap to generic cancer biomarkers. For example, generic cancer biomarkers can be capped at a score of 2 (which would require 3 PMIDs to reach since generic cancer biomarkers get a -1 automatically).

seankim658 commented 1 month ago

The first version of this tool has been built, it's in this repo. Right now the tool just takes either user overwritten weights or the default weights and calculates a score for each biomarker in a set of data files. It then outputs to a JSON file with the biomarker_id and a score. The clinical usage weight is not used right now as we don't have this data yet.