Use of [phenotype] and possible alternatives

I see quite a range of uses for [phenotype] in the annotations that do not really seem to fit the definition. The definition is:

The observable form taken by some character (or group of characters) in an individual or an organism, excluding pathology and disease. The detectable outward manifestations of a specific genotype.

The columns is used to both characterize the disease (stage, metastases etc.) and treatments. Basically the existing annotations almost never use the [phenotype] column to describe actual phenotype.

It has been briefly discussed in #92, where @anjaf said:

" characteristics[phenotype]: sample treated with drug A"

As a curation side-note: "phenotype" is not a good term to describe that attribute. Better annotation is "compound: drug A" and "compound: none" (for the control). The second reason why this is better is that "drug A" can be mapped to the ontology term for drug A, while "sample treated with drug A" is not an ontology term.

After that, the experimental design page was updated, and the example looks somewhat like this:

source name	characteristics[compound]	characteristics[phenotype]	factor value[phenotype]
sample_treat	necrotic tissue	compound: drug A	necrotic tissue
sample_control	normal	compound: none	normal

My questions are these:

Is this really what @anjaf meant to suggest? Compounds are still under [phenotype] here.
Are the columns accidentally switched in this example, by any chance? Some annotated datasets follow it literally and have drugs in the phenotype column.
How to accommodate other data? What would be the right terms for: compound; disease stage; response to treatment; tumor size; any other terms describing the pathology, or treatment, or their relation? The standard says that the columns names SHOULD be terms from EFO, but EFO doesn't even have compound. Here is an example of metadata available for one of the projects on PRIDE:

Age at surgery	Initial Tumor Primary/Recurrence	WHO Grade	Tumor Location	Post Surgery Progression	Time to Reccurence or Last Follow up	Max Tumor Size	History of Radiation
56	Primary	2	Convexity	Progression Free	8.2	6.4	No

How do I fit all of this in SDRF?

I see quite a range of uses for [phenotype] in the annotations that do not really seem to fit the definition. The definition is:

The observable form taken by some character (or group of characters) in an individual or an organism, excluding pathology and disease. The detectable outward manifestations of a specific genotype.

The columns is used to both characterize the disease (stage, metastases etc.) and treatments. Basically the existing annotations almost never use the [phenotype] column to describe actual phenotype.

It has been briefly discussed in #92, where @anjaf said:

" characteristics[phenotype]: sample treated with drug A"

As a curation side-note: "phenotype" is not a good term to describe that attribute. Better annotation is "compound: drug A" and "compound: none" (for the control). The second reason why this is better is that "drug A" can be mapped to the ontology term for drug A, while "sample treated with drug A" is not an ontology term.

After that, the experimental design page was updated, and the example looks somewhat like this:

source name characteristics[compound] characteristics[phenotype] factor value[phenotype] sample_treat necrotic tissue compound: drug A necrotic tissue sample_control normal compound: none normal My questions are these:

Is this really what @anjaf meant to suggest? Compounds are still under [phenotype] here.

Are the columns accidentally switched in this example, by any chance? Some annotated datasets follow it literally and have drugs in the phenotype column.

I think is this case, the columns were accidentally switched. I will fixed it.

How to accommodate other data? What would be the right terms for: compound; disease stage; response to treatment; tumor size; any other terms describing the pathology, or treatment, or their relation? The standard says that the columns names SHOULD be terms from EFO, but EFO doesn't even have compound. Here is an example of metadata available for one of the projects on PRIDE:

Age at surgery Initial Tumor Primary/Recurrence WHO Grade Tumor Location Post Surgery Progression Time to Reccurence or Last Follow up Max Tumor Size History of Radiation 56 Primary 2 Convexity Progression Free 8.2 6.4 No How do I fit all of this in SDRF?

@anjaf can you help us with this example.

Regarding "compound" not being in EFO, we actually map it to the term "chemical entity" (CHEBI_24431). EFO had done a few rounds of changes in the past, mostly dropping terms in favour of replacing them with terms imported from other ontologies. But to keep it consistent with previous curation, we usually kept referring to the category with the original term. Another example is "cell line", which is now in EFO under "cultured cell" (CL_0000010). Therefore some of the terms you can't find easily in EFO and even others are not in EFO at all.

For the medical terms, I think this is tricky to try to find an ontology term for each and every category because there are so many different types of measurements. I don't know of a good ontology that has terms for all of them. (Probably NCIt comes closest for describing tumour samples but we don't use NCIt terms in Expression Atlas.) There is a bit you can do with EFO but certainly not the same level of detail here:

tumor grading
tumor stage
tumor size
tumor mass
There are a few suitable terms (but little less specific) under clinical temporal measurement: age at diagnosis, disease recurrence, last follow up, alive at endpoint, survival time (plus children)
A few other clinical terms we commonly use: clinical history, disease staging
For tumour location, you could put this under "biopsy site" or "organism part" but it doesn't quite express the same. There is a term called cancer site in EFO but I don't think this is commonly used.

bigbio / proteomics-sample-metadata

Use of [phenotype] and possible alternatives #225