Proposal for segregating marker axioms in CL

I think we all agree that having a means to record markers used in FACS sorting 
of cells is an important and useful.  But there is an ongoing issue in CL with 
species specificity and these axioms.  There is also a issue with reconciling 
the principle of minimal commitment (what are a minimal set of criteria we can 
use to identify cell type X?) with the desire to collect long lists of markers 
that are useful in identifying cells of a particular type in the lab.  The long 
lists of markers clauses currently recorded in many equivalence axioms are not 
minimal - and are contingent on the sets of markers that experimenters happen 
to have found useful up to now in identifying and sorting cell types.

Long lists of marker axioms are not much used for automating classification in 
the ontology at the moment - and when used with manual classification, it is 
hard to keep track of their inheritance and so easy to end up with errors 
stemming from inheritance. Minimal commitment is much more likely to provide 
lists of properties useful in auto-classification, which in turn is essential 
to making CL maintainable.

Up to now, we have been trying to deal with the species specificity issue by 
making more general classes that are not specific to mouse/human - where these 
lists of markers have been defined.  Here are a couple of proposals for 
splitting identification criteria out completely from minimal sets of defining 
properties.

Proposal 1 - use prototype individuals:

Define a prototypical individual for each class.
This individual should be a member of the class whose markers are being defined
The individual will also have a some relationship to the class for which it is 
prototypical (has_prototype? exemplar_of ?).
The individual has a set of typing class axioms that specify species and all of 
the relevant markers.

DL queries can return classes via specification of markers using the 
has_prototype relation

Proposal 2: make GCI equivalence axioms that include all axioms and name the 
species
(mast cell progenitor that part_of some 'Mus musculus') EquivalentTo (… and 
has_membrane_part some X and has_membrane_part some Y and lacks_membrane_part* 
some Z…)
- A simple variant on this proposal would be to name the species-specific class 
and have two separate equivalence axioms on this class - one minimal and one 
identifying.

*For both proposals, where the definition of a cell type is inextricably linked 
with expression of some marker (e.g. CD4 positive T helper cell), this should 
clearly stay in the minimal equivalent class definition.*

Original issue reported on code.google.com by dosu...@gmail.com on 15 Jul 2014 at 6:04

I would welcome the move to either

Pattern 2 is used frequently in Uberon, we call them taxon-GCIs or 
evolutionary-variable-GCIs:
https://github.com/obophenotype/uberon/wiki/Evolutionary-variability-GCIs

Pattern 2 (GCIs) is stronger than pattern 1 (individuals), and in many cases it 
can be too strong, as the characteristics of the class may not perfectly 
stratify with species (even canonical members of).

For markers in CL, I feel the prototype/individuals approach (proposal 1) is 
the most appropriate.

Original comment by cmung...@gmail.com on 15 Jul 2014 at 6:43

I think we need to better gauge the intent of recording marker lists.  Within a 
single species, is inheritance of these properties the expectation?  If so, it 
really needs to be managed & have error checking in place - i.e. we should use 
GCIs or multiple equivalence axioms per term.  This also has the advantage of 
allowing straightforward classification of individuals characterised via marker 
expression. If not, we should use the prototype approach.

Having said that - I suspect that mapping between marker-based definitions and 
minimal commitment definitions will never be perfect.  The prototype base 
system is potentially safer in cases of divergence.

Original comment by dosu...@gmail.com on 16 Jul 2014 at 8:17

From correspondence about with Holden Maecker, going into some detail about his 
paper: Standardizing immunophenotyping for the Human Immunology Project. 
2,5,6,7 are relevant to this issue.

--

Based on the conversation we just had with Holden, I'm going forward with the 
following hypotheses (until proven otherwise)

1. Each node with a cell type name (not solely named by markers/ also has a 
label that is marker only) in the diagram there is a set S of CL cell types 
such that one of the set is a parent P to all the others, AND, a large majority 
of the cells (say 85-100% as a straw man)  that are identified using the 
markers shown in the diagram should properly be understood to be of type P.   

2. I will additionally be true that for each protein used to discriminate a 
population in the assay, or stated to be a necessary membrane protein by the CL 
will be present at the level described in a large majority of cells of type P

3. Not all protein levels used to discriminate populations in the assay should 
be listed in necessary condition in the corresponding CL type, since some are 
known to be known as proxies only.

4. In the T-cell assay, the bottom layer "activated" CD38+ HLA-DR- should be 
understood to be a distinct subtype of each of the four types of cell in the 
layer above.

---

In addition, I have three other hypotheses that may be harder to swallow, but 
which I can also defend:

5. Cell Universals (called types elsewhere in this note) should be defined 
primarily (maybe exclusively) by function, and developmentally, in the ideal 
case. 

6. Protein membrane levels should only used in necessary conditions when 
defining cell types when the level of the protein is essential for the cell to 
have the defining characteristics in (5)

7. All other marker information, including marker sets that together can be 
used to identify populations of cells that are mostly of a single type (in the 
sense of (1) above) should be recorded as part of assay representations. In 
particular the CL should stop using combinations of solely protein levels as 
necessary and sufficient conditions to be a cell type. In most cases even 
adding general parent type to the N&S conditions should be considered 
dangerous. 

None of the above yet considers a question we have previously discussed, namely 
question of how or whether to discriminate as universals cell that are said to 
have states such as "active".

Original comment by alanruttenberg@gmail.com on 8 Sep 2014 at 6:58

Here are some questions about the note in the previous comment,  from Melanie 
Courtot,  and my response to them. 

Re: Point 1)
[MC]
I find the formulation of the above statement confusing.
Let's consider the "Naive CD8+ T cell" (top right in the diagram): this is a 
node with a cell type name. What is the corresponding set S in CL? Which in 
this set is a parent to all the others?
Same question with the "T cells" node.

[AR]
For Naive CD8+ T cell I am guessing it would be be naive thymus-derived 
CD8-positive, alpha-beta T cell 
For the T cells node it would be the set t cell and all its subclasses.

The formulation is based on my discussion with Holden and a several of his 
assertions (or my best understandings of them) -  

1) that it was the intention, in the design of the assays, that the set of 
markers for named cell types were sufficient to discriminate the named cell 
type from currently known other cell types, where cell type here means 
characterized or known-to-be-interesting cell populations and,

2) That the choices for the assay were constrained - that only surface markers 
could be used, and that there could be at most 8 markers in each assay. So 
there are known trade offs in that the discriminating markers are known to not 
necessarily be definitional (e.g. IIRC, foxp3 expression is definitional for 
t-reg cells, but not accessible as a surface marker, or IL17 for Th17 cells, 
that CCR7 is a better marker than CD62L (but perhaps not perfect)  which was 
sometimes used instead in assays intending to discriminate the same 
cells) and 

3) That there is an assumption that other markers associated with cell types 
are present, but not verification in these assays. In the once case we 
discussed, CCR7+ for T-reg cells, he had some data that suggested 85-100% of 
the cells in the population also had CCR7 when tested (the assay does not check 
CCR7 status). Note that the definition of T-reg cells in CL includes CCR7+.

So the purposes of these assays is to isolate a distinct cell type 
(hypothesized to have some shared functional characteristic) with the 
understanding that the population will not necessarily have all or only the 
cells of the type targeted.

The condition that there be a set in CL, but that there be a single superclass 
in the set for all of them, is to capture the idea that there is a single type, 
but with an additional assumption - that as we learn more about the population 
we may further discriminate subtypes. 

Re: Point 2)
[MC]
I think it is expected/desired that the proteins used to discriminate 
populations are those that are declared as necessary in CL. I suspect the 
"large majority of cells of type P" comes from the fact that we can expect that 
some of the cells of type P will have lost their marker, or it won't be exposed 
on the membrane, or something similar?

[AR]
Well, my investigation was aimed at trying to understand what is and should be 
in the CL and in the representation of these assays, and I think that the 
expectation you identify should be disavowed. Holden was clear that some of the 
markers were best estimates of how to identify a population mostly of a given 
type, but that because there are compromises that may not be achieved. So no, 
my understanding is not that the assumption is that they will have lost their 
marker or it won't be expressed on the surface but rather that the assay 
attempts to isolate a highly enriched population of the type with the tools at 
hand, but those tools (these markers) may not be hold in all cases.

Since there is a viable alternative to recording the fib as true (by elevating 
the marker set to N&S) the CL should take the more conservative approach. It 
would be even more conservative, and perhaps justifiable, to go the further 
step and weaken thinks so that the markers aren't even necessary conditions, in 
some cases.

Re: Point 3)
[MC]
Do you have an example? I *think* I understand, but a specific case would help.

[AR]
This is a consequence of the purpose of the assay. The the markers are only 
known to isolate a highly enriched population of the type. The conditions for a 
marker to be necessary and sufficient is that the cells with the presence of 
the combination are all and only the cells of the given type. That's a 
different condition. 

As an example consider the discussions of the choices of which markers to use 
in the assay - CCR7/CD62L or CD45RA+/- vs CD45R0-/+ . Assays that intended to 
find populations of target cell types have uses one or the other of these. 
There are ongoing scientific arguments as to which is better, although it seems 
most would agree that the isolated population satisfies the highly enriched 
criterion. To say that either is definitional would be premature at this point, 
AFAIK.

Re: Point 4)
[MC]
Wouldn't that be equivalent to saying (for example for the activated CD8+T 
cells) that they are CD3+CD8+CD38+HLA-DR+? Being subtypes, they are CCR7 + or - 
and CD45RA+ or- as well, so we can ignore those markers in trying to identify 
them.

No, not according to Holden (please correct me if I was wrong). The intended 
sense of the diagram was that each of the 4 subtypes - naive, central memory, 
effector and effector memory, could have (interesting) subtypes that were 
activated. The diagram is ambiguous so the only way to know this is to have 
asked.

Re: Point 5)
[MC]
I believe Richard (who I am cc'ing here) was advocating in that direction as 
well, but mentioned that this was very hard to achieve.

[AR]
That I can believe. However I think it is good to have a clear compass and 
policy about what we are trying to achieve as it helps us make representation 
decisions and to help decide in various case which of alternative options to 
choose. It also defined how to shape the CL as we learn more about the cell 
types. For example, my assessment that these are the definitional criteria 
informs my proposal on the role of surface markers in CL definitions.

Re: Point 6)

[MC]
I am not sure this is true. Couldn't some cells have some sort of markers that 
are not associated with a known function and yet are discriminating for that 
specific population? See for example http://en.wikipedia.org/wiki/CD133

[AR]
They could, although even in the article you cite CD133 is suggested to be 
present on both cancer stem cells as well as classes of non-pathological cell 
types. So perhaps it doesn't discriminate as well as one would hope. 

While it is possible that such accidental correlations exist, my sense is that 
in many cases the functional/developmental characterization would win over the 
marker if there was a disagreement. If we follow (6) a disagreement would be 
recognized only if we found that the marker wasn't necessary for the defining 
function. That is a more solid way to resolve a conflict.

Re: Point 7)
[MC]
It would be helpful if you could expand on this. Why only use assay 
representations? Why are the combination of presence /absence of protein levels 
as N&S conditions an issue? I'm not disagreeing, but I'd like a bit more 
justification than a statement.

[AR]
Because we try to say only true things in ontologies, and it appears that the 
marker combinations can be '85% true' to be acceptable as discriminators. The 
square quotes around '85% true' are to distinguish it as different from true. 

Re: the question of how or whether to discriminate as universals cell that are 
said to have states such as "active"

[MC]
Wouldn't those "activated" cells have some sort of change (in terms of markers 
or else) making them capable of having a specific function for example? There 
has to be a response to the external stimulus for the cell to be "activated" 
(see also http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0001775). If you state in 
the above that cell types are defined by their function, then by the same 
reasoning shouldn't they indeed be considered as universals?

[AR]
Maybe. There are other considerations such as what, if anything, is always true 
of a cell type. The discussion is related to conversations in the BFO2 thread 
about whether universals are rigid in some sense and that their change is by 
change in qualities rather than change in type (but classes of cells defined in 
part by qualities are certainly fine). It is like the (open) question of 
whether professor is a universals, or only professor role. 

As a practical consideration it impacts our expectations of what happens to 
cell particulars as we follow them around. Do they change type from time to 
time, or are changes in type only expected in the cells that derive from 
existing cells by, e.g.  merger or division.

Original comment by alanruttenberg@gmail.com on 8 Sep 2014 at 7:09

HI 
one of the main characteristics of the immune cell is their ability to respond 
to external stimuli and shape them self to the microenvironment. Recycling 
their receptor in response of external stimuli, is a common thing, this doesn't 
make the cell different type. But it can modify or add  some functions of the 
cell. So for immune cell not is all is always true all the time. I think this 
is something that need  a discussion.

Original comment by AnnaM.Ma...@gmail.com on 8 Sep 2014 at 8:48

"5. Cell Universals (called types elsewhere in this note) should be defined 
primarily (maybe exclusively) by function, and developmentally, in the ideal 
case. "

Odd suggestion - structural properties are essential to defining cell types in 
very many cases.

Original comment by dosu...@gmail.com on 21 Oct 2014 at 4:43

cmungall / cell-ontology-DO-NOT-USE

Proposal for segregating marker axioms in CL #143