Updated Biomarker ER diagram

Reeya123 commented 1 year ago

March 2, 2023

Dan:

I have updated our existing biomarker ER diagram, highlighting “mandatory" items in the model that form the core definition of a biomarker, based on the FDA/NIH Biomarker Working Group BEST definition (below). biomarker_er_model_02272023 (1).pptx BiomarkerDB_Summary.docx

Reeya123 commented 1 year ago

Feb 6, 2023

Darren:

In order for me to assess 'core' information, I had to align this with the ontology as designed so far. This necessitated many changes, both to names, relationships, and connections (see attached):

Name changes: 1) biomarker measurement -> biomarker 2) measure_of_entity -> indicatedby<inc,dec,pres,abs> (shorthand for increased_level_of, decreased_level_of, presence_of, absence_of respectively) 3) disease_name -> medical condition (this allows for non-disease biomarkers) 4) detected_in -> sampled_from 5) specimen type -> biospecimen 6) biomarker_of -> provides_clinical_information_for (when biomarker is connecting to medical condition) or indicates_clinical_effect_of (see additions below); note that these are upper-level relations, and that curation needs to say which of the more-specific relations should be used: prognostic_for, diagnostic_for, monitors_status_of, indicates_risk_of_developing, predicts_effect_of, monitors_effect_of, indicates_response_to, and assesses_toxicity_of. 7) is_BEST_type -> is_a (there's no specific relation needed, as these are inferred from the relation between biomarker and either medical condition or chemical entity) 8) has_entity_type -> is_a (there's no specific relation needed, as these are inferred from the ontological hierarchy for the assessed biomarker entity)

Connection removals: 1) associated_with (between assessed biomarker entity and what is now medical condition). The entity itself says nothing, only the biomarker does. 2) assayed_in (redundant with sampled_from, which is between the biomarker and the specimen, but see 'dotted line' note below) 3) occurs_in (between medical condition and biosample). The medical condition need not be located where a sample was taken from. 4) is_BEST_type_for (between BEST biomarker type and either medical condition or chemical entity). Somewhat superfluous, as this information is captured a different way.

Connection additions: 1) indicates_clinical_effect_of (between biomarker and chemical entity; some BEST types have to do with exposures) 2) brackets for literature evidence (not sure if a relation is used for this; will look into); we might be able to target specific statements about biomarkers, like "has_LOINC some LOINC code [PMID:123456789]"

Moved: 1) assessed entity type (original line looked like it might be coming from biomarker instead of assessed biomarker entity)

To possibly revamp: 1) Things like blood pressure or heart rate are measured but not biospecimens per se. Will need to add a class for these (characteristic?), and probably also a new relation (I think I have these somewhere, but they are not yet in the ontology file). 2) Currently the 'sampled_from' relation is between biomarker and biospecimen, but need to consider making the relation between the assessed entity and the biospecimen.

THE CORE: 1) assessed biomarker entity 2) the medical condition or chemical entity that the biomarker intends to inform on 3) the indication; this is what gets built in to 'biomarker' (eg, 'increased level of') 4) the type of relation between the biomarker and medical condition/chemical entity (this will be a BEST-ish relation, like prognostic_for; see note for#6 under 'changes') 5) biospecimen? I imagine this would be important for some types of diseases, or even stages of disease

Believe it or not, we don't have to capture the biomarker itself, as this gets built/inferred from#1 and#3 of the core list above.

Reeya123 commented 1 year ago

Feb 6, 2023

Dan:

Darren, thankyou for these notes and modifications.

We’re also working on a summary document of some pilot work to automate acquisition of biomarker data from public resources and populate the data model that has been manually curated (and delivered for OBCI). Could you provide comments on the document at BiomarkerDB_Summary.docx

We’ve done a pretty good job so far mapping the external data to our model as a basis for automation, but best_biomarker_type has been a challenge (absent in most sources examined). Any ideas on how to infer best_biomarker_type data from these sources, based on rules (maybe), would be a great help.

Reeya123 commented 1 year ago

Feb 10, 2023

Raja:

I still think we need a reference box. What is the normal range? Filling it in will be optional but most biomarkers can get a normal range.

Dan - I think for CFDE work we need another diagram which strictly deals with molecular biomarkers. We need to scope our effort Dan for all of the boxes we need an example ID types or ontology we are going to use. For medical condition we can say e.g. DO, HPO

Darren:

This seems like a can of worms to me. Strictly speaking, 'normal' is relative to an individual. I also don't think there's a way to actually use the information. Everytime I talk about this work people ask me why no reference ranges for things that obviously has them. If you want to make this work clinically relevant also it might be useful to have a box (dotted is fine). If you see markerDB they have reference ranges. In real world clinicians rely on normal ranges. If we are writing a proposal this will be a point that will be discussed that might weaken our proposal. Also, it is possible for some of the biomarkers we will be able to get this data by mining EHR data. We can keep this out for now if both of you disagree. But if more people ask for it we need to address this.

Raja:

Darren- I am not sure I understand what Chemical entity is supposed to mean. Isn't the chemical entity same as assessed biomarker entity?

Darren:

No. Some of the BEST biomarker types deal with exposures to (broadly stated) chemicals. In such cases the assessed biomarker entity is what indicates that a person has been exposed to the chemical entity. See Response and Safety biomarkers.

Raja:

Lets rename it to then environmental_exposure_entity

Reeya123 commented 1 year ago

Feb 10, 2023

Darren:

The reason I said it will be a can of worms is because once you introduce these ranges, it'll become an expectation to have them, and not one you'll be able to fill easily (I imagine it will involve a LOT of manual curation). If you're okay with that, then go for it. Indeed, I have zero hesitation about including this information in a database. But bear in mind that if we're talking about the ontology, these will be fully useless, which is to say there's no way to use them for reasoning or classification. But I definitely see that including them would be a selling point.

As for 'chemical entity' I only used that term because it is what it will be in the ontology. It's the top-level term in CHEBI. All of the things I put in that revised figure are the actual names used in the ontology. 'environmental exposure entity' will, at best, be defined in terms of chemical entities anyway.

Reeya123 commented 1 year ago

Feb 10, 2023

Dan:

Agreed for curation of references; a can of worms on a practical level, but the question comes up alot in discussions and proposal reviews. So some sort of response is needed (perhaps depending on context). For example, perhaps in a proposal use a well-defined description and terminology, but for a 1-page summary use a somewhat less strict representation.

Darren:

I would opt for the less strict representation all around. A well-defined description I imagine would require relations like 'has_normal_upper_bound' and 'has_normal_lower_bound' and another for the measured units, but since these can't be used for anything other than information that seems like overkill. As mere information to be read by a human, a property value like 'has_normal_range' " to " will be easier for humans to process.

Dan:

Regarding chemical entities (strictly speaking), would this cover viruses, bacteria, and so forth?

Darren:

That's a good question. For sure 'chemical entity' does not cover organisms, but then again I'm not sure that's what's meant for the relevant BEST categories. These all refer to exposure "to a medical product or an environmental agent". We'll have to see what is meant by 'environmental agent'. I suspect these don't include organisms, though upon reflection these probably would include non-chemicals like radiation. We'd have to add something like that, so perhaps we could indeed use 'environmental agent' as the upper level, and this would include chemical entities from CHEBI plus those non-chemical agents. Then again, the real upper level would also have to include 'medical products', so 'environmental agent' would still be too restrictive.

Okay, I found this: https://www.niehs.nih.gov/health/topics/agents/index.cfm

I'd say that it DOES include organisms, at least on the surface. Dust mites and mold are listed, for example. That means we'll need to craft a definition for an upper level term that allows for medical products, chemicals, and organisms (though we can get away with a definition that just says the upper level term includes medical products and environmental agents, and then define environmental agents separately from a definition of medical products).

Reeya123 commented 1 year ago

Feb 10, 2023

Dan:

I’m working on getting our discussions into a github repo and I’ve added a comment to the figure legend about reference ranges.

For a formal representation of biomarkers, based on the FDA/NIH definition, I think we will need some tweaks to Darren’s model (below, and attached).

For example, some ‘biomarker’ (a measure) sampled_from some ‘biospecimen’ would have a different semantics than some ‘assessed entity’ (an object) sampled_from some ‘biospecimen’. I think we (and FDA/NIH) intend the meaning to be some

clinical-biomarkers / OBCI