airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Question around study_group_description #164

Open bcorrie opened 5 years ago

bcorrie commented 5 years ago

Again, from Emily, sparked by our discussion at the Vocab/Ontology meeting...

We have been treating it solely as case and control, but never actively defined "case" - generally the studies we work this have a single case that can be described in diagnosis, etc. Essentially, should we change this and define case and simply refer to a control as a control? Or should we alter it so that we define the two more effectively. Eg. study on SLE, case is patients with flares, control is patients without flares. Would we want this defined specifically in study_group_description?

Maybe this is a discussion for the Vocab/Ontology diagnosis group?

lgcowell commented 5 years ago

I vote for the more detailed definitions because that is the only way I think that the samples can be effectively used for a variety of meta-analyses.

bussec commented 2 years ago

@bcorrie what is the current status of this issue? We are planing to represent the study_group_description in our backend DB, but were are bit puzzled, as we considered it to be a property of the Subject not of the Diagnosis (where it is currently located).

bcorrie commented 2 years ago

@bussec this has not progressed. There seem to be two separate questions here:

We have come up with our own "semi-controlled vocabulary" for this field that we use in our curation process, ao it makes it possible to find "Case/Control" as well as "Healthy" subjects if you know the controlled vocabulary. This is unsatisfactory. 8-)

I have little experience in study design, so I am not sure which AIRR object this belongs to... Subject seems a bit limiting to me, hence why I think maybe it is in diagnosis??? Can you conceive of a study where you had two different samples from the same subject and one was a "Control" and the other "Case". For example, healthy tissue being a Control and diseased tissue (e.g a tumor receiving some treatment) a Case? Or two diseased tissue samples from a subject and one tissue receiving some sort of intervention and the other not???

bussec commented 2 years ago

@bcorrie Some thoughts on this:

  1. There are situations conceivable in which two samples taken from the same subject at the same time point would belong to different "case"/ "control" groups, e.g., radiation or embolization protocols. But for immunology these things will be rare, if they exists at all. Also, this is what we have Sample.disease_state_sample for :wink: .
  2. It seems to me like this is a "relative-absolute" problem and in the end we need both types of information:
    • disease_diagnosis describes the absolute state while
    • study_group_description defines the relative position within the cohort.
  3. Therefore, as you already wrote, study_group_description=Control is not helpful when you are looking for healthy controls. This can only be captured in disease_diagnosis.
  4. As DOID does not seem to contain a concept for "no apparent disease", we could either:
    • make a term request (maybe it just never occurred to the maintainers) or
    • introduce a boolean property Subject.healthy
schristley commented 2 years ago
  1. As DOID does not seem to contain a concept for "no apparent disease", we could either:
    • make a term request (maybe it just never occurred to the maintainers) or
    • introduce a boolean property Subject.healthy

I've asked IEDB how they handle this as it might provide some guidance.

bcorrie commented 2 years ago

Should a healthy field be at the Subject level. What about a sample from healthy tissue versus disease tissue? Should this be subjcet.diagnosis.healthy instead?

And I wouldn't think that subject.diagnosis.study_group_description == Control and subject.healthy == true (or even subject.diagnosis.healthy = true would necessarily mean a healthy control would it? You could certainly have that state when the study did not have a Control (Healthy) study group.

It kind of feels to me like study_group_description could use some refinement. Almost like we need an additional field (or two) that describes the details of the study groups. study_group_description could be a controlled vocabulary (Case, Control) but then maybe we need a field (e.g. in subject.diagnosis ) that states a qualifier/keyword to Case/Control that explicitly says that the sample belongs to a study design subgroup. For example, subject.diagnosis.study_design_keywords = [Healthy] or subject.diagnosis.study_design_keywords = [Healthy, Vaccinated]

If we have some controlled vocabulary terms (e.g. Healthy) for the keywords, but allow researchers to add their own, that would cover most of the bases and in particular allow us to look for healthy controls (subject.diagnosis.study_group_description == Control and subject.diagnosis.study_design_keywords = [Healthy])

schristley commented 2 years ago
  1. As DOID does not seem to contain a concept for "no apparent disease", we could either:

    • make a term request (maybe it just never occurred to the maintainers) or
    • introduce a boolean property Subject.healthy

I've asked IEDB how they handle this as it might provide some guidance.

From Randi @ IEDB:

we use an internal identifier that we coined healthy ONTIE [ONTIE:0003423] we use "host health status" as the highest node and integrate disease ontology terms, healthy, infection without disease, and animal models of disease into a single owl file/tree view

schristley commented 2 years ago

Should a healthy field be at the Subject level. What about a sample from healthy tissue versus disease tissue? Should this be subjcet.diagnosis.healthy instead?

Yes, exactly ;-D It all depends by what you mean, "healthy control", which is ambiguous and may be different based upon the analysis being performed. It is certainly reasonable that a subject, designated as "healthy", would let you consider all samples from that subject as potential healthy controls.

However, it's become very common in cancer studies to collect a tumor sample but also collect an adjacent healthy tissue sample for comparative analysis. In this case, the subject is not healthy (as they have cancer), but that adjacent tissue is considered a healthy control for analysis purposes.

Which is all quite different from a clinical trial with one set of subjects designated as "Case" and given a treatment, and another set designated as "Control" without treatment, but in both sets the subjects are not "healthy".

bussec commented 2 years ago

Link to ONTIE: https://ontology.iedb.org/ontology

scharch commented 1 year ago

Note: high overlap with #516

bcorrie commented 8 months ago

@javh I think this should be an AIRR 2.0 issue no? The limitation is that there is no mechanism in the AIRR Spec to designate a healthy control.