clinical-biomarkers / OBCI

The Ontology for Biomarkers of Clinical Interest (OBCI) formally defines biomarkers for diseases, phenotypes, and effects.
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Updated Biomarker ER diagram #11

Closed Reeya123 closed 4 months ago

Reeya123 commented 1 year ago

March 2, 2023

Dan:

I have updated our existing biomarker ER diagram, highlighting “mandatory" items in the model that form the core definition of a biomarker, based on the FDA/NIH Biomarker Working Group BEST definition (below). biomarker_er_model_02272023 (1).pptx BiomarkerDB_Summary.docx

Reeya123 commented 1 year ago

Feb 6, 2023

Darren:

In order for me to assess 'core' information, I had to align this with the ontology as designed so far. This necessitated many changes, both to names, relationships, and connections (see attached):

Name changes: 1) biomarker measurement -> biomarker 2) measure_of_entity -> indicatedby<inc,dec,pres,abs> (shorthand for increased_level_of, decreased_level_of, presence_of, absence_of respectively) 3) disease_name -> medical condition (this allows for non-disease biomarkers) 4) detected_in -> sampled_from 5) specimen type -> biospecimen 6) biomarker_of -> provides_clinical_information_for (when biomarker is connecting to medical condition) or indicates_clinical_effect_of (see additions below); note that these are upper-level relations, and that curation needs to say which of the more-specific relations should be used: prognostic_for, diagnostic_for, monitors_status_of, indicates_risk_of_developing, predicts_effect_of, monitors_effect_of, indicates_response_to, and assesses_toxicity_of. 7) is_BEST_type -> is_a (there's no specific relation needed, as these are inferred from the relation between biomarker and either medical condition or chemical entity) 8) has_entity_type -> is_a (there's no specific relation needed, as these are inferred from the ontological hierarchy for the assessed biomarker entity)

Connection removals: 1) associated_with (between assessed biomarker entity and what is now medical condition). The entity itself says nothing, only the biomarker does. 2) assayed_in (redundant with sampled_from, which is between the biomarker and the specimen, but see 'dotted line' note below) 3) occurs_in (between medical condition and biosample). The medical condition need not be located where a sample was taken from. 4) is_BEST_type_for (between BEST biomarker type and either medical condition or chemical entity). Somewhat superfluous, as this information is captured a different way.

Connection additions: 1) indicates_clinical_effect_of (between biomarker and chemical entity; some BEST types have to do with exposures) 2) brackets for literature evidence (not sure if a relation is used for this; will look into); we might be able to target specific statements about biomarkers, like "has_LOINC some LOINC code [PMID:123456789]"

Moved: 1) assessed entity type (original line looked like it might be coming from biomarker instead of assessed biomarker entity)

To possibly revamp: 1) Things like blood pressure or heart rate are measured but not biospecimens per se. Will need to add a class for these (characteristic?), and probably also a new relation (I think I have these somewhere, but they are not yet in the ontology file). 2) Currently the 'sampled_from' relation is between biomarker and biospecimen, but need to consider making the relation between the assessed entity and the biospecimen.

THE CORE: 1) assessed biomarker entity 2) the medical condition or chemical entity that the biomarker intends to inform on 3) the indication; this is what gets built in to 'biomarker' (eg, 'increased level of') 4) the type of relation between the biomarker and medical condition/chemical entity (this will be a BEST-ish relation, like prognostic_for; see note for#6 under 'changes') 5) biospecimen? I imagine this would be important for some types of diseases, or even stages of disease

Believe it or not, we don't have to capture the biomarker itself, as this gets built/inferred from#1 and#3 of the core list above.

Reeya123 commented 1 year ago

Feb 6, 2023

Dan:

Darren, thankyou for these notes and modifications.

We’re also working on a summary document of some pilot work to automate acquisition of biomarker data from public resources and populate the data model that has been manually curated (and delivered for OBCI). Could you provide comments on the document at BiomarkerDB_Summary.docx

We’ve done a pretty good job so far mapping the external data to our model as a basis for automation, but best_biomarker_type has been a challenge (absent in most sources examined). Any ideas on how to infer best_biomarker_type data from these sources, based on rules (maybe), would be a great help.

Reeya123 commented 1 year ago

Feb 10, 2023

Raja:

I still think we need a reference box. What is the normal range? Filling it in will be optional but most biomarkers can get a normal range.

Dan - I think for CFDE work we need another diagram which strictly deals with molecular biomarkers. We need to scope our effort Dan for all of the boxes we need an example ID types or ontology we are going to use. For medical condition we can say e.g. DO, HPO

Darren:

This seems like a can of worms to me. Strictly speaking, 'normal' is relative to an individual. I also don't think there's a way to actually use the information. Everytime I talk about this work people ask me why no reference ranges for things that obviously has them. If you want to make this work clinically relevant also it might be useful to have a box (dotted is fine). If you see markerDB they have reference ranges. In real world clinicians rely on normal ranges. If we are writing a proposal this will be a point that will be discussed that might weaken our proposal. Also, it is possible for some of the biomarkers we will be able to get this data by mining EHR data. We can keep this out for now if both of you disagree. But if more people ask for it we need to address this.

Raja:

Darren- I am not sure I understand what Chemical entity is supposed to mean. Isn't the chemical entity same as assessed biomarker entity?

Darren:

No. Some of the BEST biomarker types deal with exposures to (broadly stated) chemicals. In such cases the assessed biomarker entity is what indicates that a person has been exposed to the chemical entity. See Response and Safety biomarkers.

Raja:

Lets rename it to then environmental_exposure_entity

Reeya123 commented 1 year ago

Feb 10, 2023

Darren:

The reason I said it will be a can of worms is because once you introduce these ranges, it'll become an expectation to have them, and not one you'll be able to fill easily (I imagine it will involve a LOT of manual curation). If you're okay with that, then go for it. Indeed, I have zero hesitation about including this information in a database. But bear in mind that if we're talking about the ontology, these will be fully useless, which is to say there's no way to use them for reasoning or classification. But I definitely see that including them would be a selling point.

As for 'chemical entity' I only used that term because it is what it will be in the ontology. It's the top-level term in CHEBI. All of the things I put in that revised figure are the actual names used in the ontology. 'environmental exposure entity' will, at best, be defined in terms of chemical entities anyway.

Reeya123 commented 1 year ago

Feb 10, 2023

Dan:

Agreed for curation of references; a can of worms on a practical level, but the question comes up alot in discussions and proposal reviews. So some sort of response is needed (perhaps depending on context). For example, perhaps in a proposal use a well-defined description and terminology, but for a 1-page summary use a somewhat less strict representation.

Darren:

I would opt for the less strict representation all around. A well-defined description I imagine would require relations like 'has_normal_upper_bound' and 'has_normal_lower_bound' and another for the measured units, but since these can't be used for anything other than information that seems like overkill. As mere information to be read by a human, a property value like 'has_normal_range' " to " will be easier for humans to process.

Dan:

Regarding chemical entities (strictly speaking), would this cover viruses, bacteria, and so forth?

Darren:

That's a good question. For sure 'chemical entity' does not cover organisms, but then again I'm not sure that's what's meant for the relevant BEST categories. These all refer to exposure "to a medical product or an environmental agent". We'll have to see what is meant by 'environmental agent'. I suspect these don't include organisms, though upon reflection these probably would include non-chemicals like radiation. We'd have to add something like that, so perhaps we could indeed use 'environmental agent' as the upper level, and this would include chemical entities from CHEBI plus those non-chemical agents. Then again, the real upper level would also have to include 'medical products', so 'environmental agent' would still be too restrictive.

<spends time looking up 'environmental agent'>

Okay, I found this: https://www.niehs.nih.gov/health/topics/agents/index.cfm

I'd say that it DOES include organisms, at least on the surface. Dust mites and mold are listed, for example. That means we'll need to craft a definition for an upper level term that allows for medical products, chemicals, and organisms (though we can get away with a definition that just says the upper level term includes medical products and environmental agents, and then define environmental agents separately from a definition of medical products).

Reeya123 commented 1 year ago

Feb 10, 2023

Dan:

I’m working on getting our discussions into a github repo and I’ve added a comment to the figure legend about reference ranges.

For a formal representation of biomarkers, based on the FDA/NIH definition, I think we will need some tweaks to Darren’s model (below, and attached).

For example, some ‘biomarker’ (a measure) sampled_from some ‘biospecimen’ would have a different semantics than some ‘assessed entity’ (an object) sampled_from some ‘biospecimen’. I think we (and FDA/NIH) intend the meaning to be some , not some . And the FDA/NIH definition says nothing about a ‘biospecimen’ thing; so, I think it also would be a very important contextual annotation, but not a “core” element.

Also, some ‘biomarkers’ are an indicator of some biological process (of which ‘medical condition’ is_a child), while some ‘biomarkers’ are an indicator of response to some ‘agent’ (more general of course than ‘environmental agent‘).

Some of these entities and relations may be too broad, which begs the question of scope; meaning that further modifications of the figure may be needed if the scope is to be limited to molecular, for example.

image
Reeya123 commented 1 year ago

Feb 10, 2023

Darren:

A biomarker is not a measurement, and the FDA/NIH doesn't say that it is. That's why I changed that relation, as it implies such. Rather, it is what we learn from measurement. For example, blood pressure is a measurement; increased blood pressure is a biomarker. Glucose concentration is a measurement; increased glucose concentration is a biomarker. For the sampled_from relation, the connection was made to biomarker as opposed to assessed biomarker entity because the latter has no context. You can't assert glucose sampled_from blood, because there are many places glucose can be sampled from, including for purposes wholly unrelated to clinical measurements. That being said, I agree that saying biomarker sampled_from biospecimen sounds odd, likely because I used that as a kind of shorthand. I can think of two possible fixes, the first of which keeps the notion of connecting the assessed entity to where the sample came from; it is rather complicated and breaks some reasoning. The second fix basically keeps the original design with a small modification; it is simple to implement:

1) Incorporate the sample into the biomarker definition (see below for why this might need to be done). So, instead of saying " = biomarker and indicated_by_increased_level_of and sampled_from " (which, effectively, can be separated into two statements, both about ), we'd say " = biomarker and indicated_by_increased_level_of ( sampled_from ). Perhaps that is what you meant, but the figure wasn't capturing that (nor can I figure out a way to represent that pictorially, hence why I made the connection how I did). Doing this prevents us from reasoning that the biomarker was assessed by sampling blood (I just tried it). This brings me to what I think is a better and simpler fix:

2) Change the name of the relation. Instead of 'sampled_from', we use something like 'assessed_by_sampling'. Indeed, we can have both, one for connecting to the assessed entity and the other for connecting to the biomarker. I suspect having both will be mildly confusing and perhaps needlessly complicating, but with some work it might do what is needed.

I also realize that we might not be addressing the same purpose. I think you're addressing specifically what the FDA/NIH says about biomarkers, in which case, yes, they don't mention specimen and it would be non-core (though see this article of interest: https://www.ncbi.nlm.nih.gov/books/NBK566059/). As always, I'm thinking of the ontology and what would be needed to define biomarkers, in which case I'd say that the biospecimen is important. Indeed, in some cases it is absolutely essential (for example, a finding of white blood cells in urine has different indications than, say, increased WBC in blood).

Reeya123 commented 1 year ago

Feb 13, 2023

Dan:

Yes, right. A biomarker is not a measurement of an assessed entity. I was sloppy there. Would you agree that a biomarker (to paraphrase FNBWG) is an observable (measurable) different state (in a sample from a subject),

Darren:

For sure, yes. That is why the relations between the biomarker and assessed entity includes directionality or presence/absence.

Dan:

So, in the ontology the directionality/presence/absence is expressed in the relation; in the data table, it's expressed in the data value.

Darren:

By 'data value' do you refer to the biomarker name? In the ontology it is given both in the name of the biomarker and in the relation. The relation is the more important of the two for ontology purposes, but in the end it doesn't matter where it's kept in the data table because the ontology can make use of it. Indeed, if desired, the data table technically can be just as useful without the biomarker name, as long as the directionality and the entity are given.

Dan:

Yes

Reeya123 commented 1 year ago

Feb 13, 2023

Dan:

compared to a (population) norm/reference value, of an assessed entity consistently associated with some particular circumstance/condition/process (e.g., disease)?

Darren:

As an approximation, yes. I say approximation because really one should compare against what is normal for self. This of course doesn't work for congenital issues. For all else a comparison to self would be the standard (even clinically I imagine), with the population average as a fall-back (for example, if a patient didn't have a baseline on record).

Dan:

Yes, but it may not necessarily work, I imagine, for an individual, strictly speaking; perhaps the "normal" state of an entity for an individual may change over time and differ at 60 yrs old vs. 25 yrs (e.g., blood pressure). This (ontology of biomarker) can get very complicated.

Darren:

Yep! Well, like I said, self-comparison is the gold standard in my opinion, fully aware it won't always be achievable. Not including these ranges removes all complications. In my initial modeling I went with a use case where the physician makes the judgment as to whether or not the assessed entity level is normal, using the ontology to help figure out the indication. I wasn't thinking that the physician would make a measurement and use the ontology to figure out if the measurement constituted a biomarker (above normal, below normal, present, absent).

Dan:

Agreed, I'm all for removing complications whenever possible. My concern is that, in some circumstances (not necessarily this one), information may be lost which might affect reasoning (good or bad). I raise the point because this seems to be a common comment from reviewers and others whenever talking about ontology modeling.

Darren:

As mentioned previously, we can include these ranges, though to me it seems more appropriate to capture these in the database since they can't be used in any way in the ontology.

Reeya123 commented 1 year ago

Feb 13, 2023

Dan:

This is one of the reasons why I (very mildly) objected to the notion of 'normal range'. I'm also wary of anything that can be used to make clinical interpretations. Including these ranges crosses that line, and we'd have to put disclaimers on every entity (this is what UniProt had to do). We might have to do that anyway once we connect biomarkers to disease.

Darren:

Agreed. I have heard some essentially suggest that a biomarker is not a biomarker if not used in a clinical setting; personally, I don't agree with that view.

Dan:

I don't have a strong opinion either way, but the wording of the FDA/NIH definition seems to agree that a clinical setting would be involved. I guess it comes down to what's considered a clinical setting. If 'clinical setting' is restricted to hospital or doctor's office or requires a doctor in some way, then I agree with you. I would consider an elevated temperature taken by my own thermometer just as valid as one used in the doctor's office.

Darren:

Agreed, but this idea of 'clinical setting' for a biomarker seems to come up alot from others. A strict interpretation of clinical setting would exclude home tests and even bench research, for example.

Dan:

I personally think it makes more sense to talk about 'clinical use' as opposed to 'clinical setting'. Either way, I don't how this would affect our work.

Reeya123 commented 1 year ago

Feb 13, 2023

Dan:

Investigators interpret/infer the observed different state to signify the potential or actual existence or some particular status/process. If so, generally, the relation of a biomarker-assessed entity is something like .

Darren:

I see where you're going with it, but that would be somewhat imprecise. In our treatment, biomarker has already built-in the notion of different-ness. A biomarker is indicated by the difference between a current observed state of some entity vs some previous observed state of that same entity (or, depending on the biomarker and as you point out above, between the current observed state vs some accepted standard). In all cases a biomarker is a comparative thing.

Dan:

Yes, always comparative. Although precisely defining the particular comparison can get tricky.

Reeya123 commented 1 year ago

Feb 13, 2023

Dan:

Although, I don’t think that’s very satisfactory; it doesn't really express the “significance” of a biomarker.

Darren:

Unclear what you mean by 'significance', unless you mean something like 'slightly above normal' vs 'greatly above normal'? That raises a point we haven't yet discussed: the possibility that slightly above and greatly above could have different indications (that is, point to different diseases). I hesitate to include such nuances because they are terribly difficult to define.

Dan:

I meant that asserting that the state of some entity differs from another state does not necessarily or inherently convey a notion of "biomarkerness".

Darren:

I assume here you mean that sometimes the difference is well within a normal range (whether population average or self average). In such cases these are still considered biomarkers though, per FDA definition "...measured as an indicator of normal biological processes...". I tend to think of biomarkers as anything that can be used--when abnormal--to indicate a potential medical issue, so if it is normal, then that's an indication that there's no issue.

Dan:

A relation observed_state_differs_from seems vague to me.

Darren:

We don't have such a relation currently, though it could be useful solely as an upper-level parent term for the more specific relations we currently have. It could be marked as 'do not use for annotation; there are a number of relations marked as such in the Relations Ontology.

Reeya123 commented 1 year ago

Feb 13, 2023

Darren:

And yes, different states of a biomarker for different diseases is a highly likely possibility, I think. Agreed, very difficult to define.

Dan:

Yes, I see your point about biosamples; agreed. So, we follow and also extend FNBWG, I guess.

I wonder if there is a way to infer the location of (a specific biomarker) from its relation to an assessed entity, which is “sampled_in” a biospecimen?

Darren:

Can you elaborate? I don't think biomarkers have locations per se. That sampled_from (or, better, assessed_by_sampling) was intended as a kind of shortcut to say, "the biomarker (e.g., increased level of ) was determined by comparing levels of in samples from ". In any case, I don't think an inference can be made with respect to biospecimen; it is either known, or not. I suppose there could be cases where an assessed entity is ONLY found in some particular place, and even if not stated we'd know what that place is. Is that what you mean here?

Dan:

I gather you want to assert some relation of a biomarker with a biosample; more specifically, that the marker is observed/measured in some sample (an object obtained from some tissue/location for contextual knowledge)?

Darren:

Not necessarily for contextual knowledge. One of the use cases I can imagine is when a doctor has a vial of blood. What measurements can be taken from this sample? In another case, the same entity measured as abnormal in one sample might mean something different than that same entity measured as abnormal in a different sample.

Dan:

Interesting. Have we compiled a well-defined list of use cases yet? It might help bound the parameters (provide a common understanding of the imagined) uses of the ontology and its development.

Darren:

More work needs to be done in this area. Once I clear out some time-sensitive issues on my end, this is what I plan on tackling next.

Reeya123 commented 1 year ago

Dan:

I wonder if that knowledge can be inferred (perhaps by a rule?) by the logical chain that:

biomarker assessed entity assessed entity biosample therefore biomarker <some relation> biosample distinct relations

Darren:

That's what I tried the other day. Couldn't get it to work, but I might have done it incorrectly. Note that, even if it can be made to work, the chain must be within the context of the biomarker, not the assessed entity itself. That basically means that we can easily define a relation that directly connects between biomarker and biosample (for example, the aforementioned assessed_by_sampling) as meaning exactly that: biomarker assessed_by_sampling biospecimen means that there is some assessed biomarker entity that was sampled from the given biospecimen.

Dan:

Hmm; seems like the connection of a particular biomarker with a particular assessed entity is lost here (other than the notion that a biomarker (different state) was observed in a biospecimen for some unspecified entity and some condition? Does the reasoning work?

Darren:

I'm not sure what you mean here. We have a definite direct connection between biomarker and assessed entity. Do you mean that the connection between assessed entity and biospecimen is lost? In the modeling I have at the moment (subject to change as more examples are added), there is a connection between biomarker and assessed entity, between biomarker and biospecimen, and between biomarker and condition. The biomarker+entity connection is a matter of definition. In some cases, the definition might need to include the biospecimen (so, biomarker+entity+biospecimen). Under no circumstance would it be correct to make a connection between assessed entity+biospecimen outside the context of biomarker. That is to say, we can't say glucose sampled_from blood, because that would be asserting that glucose is found only in blood, which isn't true. We can, however, say 'blood glucose' sampled_from 'blood', and we can say 'blood glucose' is_a 'glucose', and 'glucose' is_a 'chemical entity'. I'll have to ruminate on the gains, losses, and potential complications of incorporating the sample into the biomarker definition.

Reeya123 commented 1 year ago

Feb 24, 2023

Dan:

Raja, I've revised the biomarker ER diagram and legend based on recent discussions. The attached file has two versions of the diagram; with entity examples and without. I'll send it to Darren today for comments/edits. Do you want to have a look before I send?

Raja:

Why is specimen type green? I am sure that OpenTarget and GWAS and ClinVar do not have specimens.

Dan:

I believe Darren argued that in some (not all) circumstances specimen type is 'core' in distinguishing biomarkers. I think his example was WBC in urine vs. blood. I colored specimen type green to avoid complicating the figure with nuance. Would you prefer some type of distinction for specimen type?