Open bcorrie opened 5 years ago
I vote for the more detailed definitions because that is the only way I think that the samples can be effectively used for a variety of meta-analyses.
@bcorrie what is the current status of this issue? We are planing to represent the study_group_description
in our backend DB, but were are bit puzzled, as we considered it to be a property of the Subject
not of the Diagnosis
(where it is currently located).
@bussec this has not progressed. There seem to be two separate questions here:
We have come up with our own "semi-controlled vocabulary" for this field that we use in our curation process, ao it makes it possible to find "Case/Control" as well as "Healthy" subjects if you know the controlled vocabulary. This is unsatisfactory. 8-)
I have little experience in study design, so I am not sure which AIRR object this belongs to... Subject seems a bit limiting to me, hence why I think maybe it is in diagnosis??? Can you conceive of a study where you had two different samples from the same subject and one was a "Control" and the other "Case". For example, healthy tissue being a Control and diseased tissue (e.g a tumor receiving some treatment) a Case? Or two diseased tissue samples from a subject and one tissue receiving some sort of intervention and the other not???
@bcorrie Some thoughts on this:
Sample
.disease_state_sample
for :wink: .disease_diagnosis
describes the absolute state while study_group_description
defines the relative position within the cohort.study_group_description=Control
is not helpful when you are looking for healthy controls. This can only be captured in disease_diagnosis
.Subject.healthy
- As DOID does not seem to contain a concept for "no apparent disease", we could either:
- make a term request (maybe it just never occurred to the maintainers) or
- introduce a boolean property
Subject.healthy
I've asked IEDB how they handle this as it might provide some guidance.
Should a healthy
field be at the Subject
level. What about a sample from healthy tissue versus disease tissue? Should this be subjcet.diagnosis.healthy
instead?
And I wouldn't think that subject.diagnosis.study_group_description == Control
and subject.healthy == true
(or even subject.diagnosis.healthy = true
would necessarily mean a healthy control would it? You could certainly have that state when the study did not have a Control (Healthy)
study group.
It kind of feels to me like study_group_description could use some refinement. Almost like we need an additional field (or two) that describes the details of the study groups. study_group_description
could be a controlled vocabulary (Case, Control) but then maybe we need a field (e.g. in subject.diagnosis
) that states a qualifier/keyword to Case/Control that explicitly says that the sample belongs to a study design subgroup. For example, subject.diagnosis.study_design_keywords = [Healthy]
or subject.diagnosis.study_design_keywords = [Healthy, Vaccinated]
If we have some controlled vocabulary terms (e.g. Healthy) for the keywords, but allow researchers to add their own, that would cover most of the bases and in particular allow us to look for healthy controls (subject.diagnosis.study_group_description == Control
and subject.diagnosis.study_design_keywords = [Healthy]
)
As DOID does not seem to contain a concept for "no apparent disease", we could either:
- make a term request (maybe it just never occurred to the maintainers) or
- introduce a boolean property
Subject.healthy
I've asked IEDB how they handle this as it might provide some guidance.
From Randi @ IEDB:
we use an internal identifier that we coined healthy ONTIE [ONTIE:0003423] we use "host health status" as the highest node and integrate disease ontology terms, healthy, infection without disease, and animal models of disease into a single owl file/tree view
Should a
healthy
field be at theSubject
level. What about a sample from healthy tissue versus disease tissue? Should this besubjcet.diagnosis.healthy
instead?
Yes, exactly ;-D It all depends by what you mean, "healthy control", which is ambiguous and may be different based upon the analysis being performed. It is certainly reasonable that a subject, designated as "healthy", would let you consider all samples from that subject as potential healthy controls.
However, it's become very common in cancer studies to collect a tumor
sample but also collect an adjacent healthy tissue
sample for comparative analysis. In this case, the subject is not healthy (as they have cancer), but that adjacent tissue is considered a healthy control for analysis purposes.
Which is all quite different from a clinical trial with one set of subjects designated as "Case" and given a treatment, and another set designated as "Control" without treatment, but in both sets the subjects are not "healthy".
Link to ONTIE: https://ontology.iedb.org/ontology
Note: high overlap with #516
@javh I think this should be an AIRR 2.0 issue no? The limitation is that there is no mechanism in the AIRR Spec to designate a healthy control.
Again, from Emily, sparked by our discussion at the Vocab/Ontology meeting...
We have been treating it solely as case and control, but never actively defined "case" - generally the studies we work this have a single case that can be described in diagnosis, etc. Essentially, should we change this and define case and simply refer to a control as a control? Or should we alter it so that we define the two more effectively. Eg. study on SLE, case is patients with flares, control is patients without flares. Would we want this defined specifically in study_group_description?
Maybe this is a discussion for the Vocab/Ontology diagnosis group?