airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

CellExpression needs a property_type #699

Closed bcorrie closed 10 months ago

bcorrie commented 1 year ago

In the CellExpression object the property can take on different values based on different types of pipelines (e.g. 10X).

                Name of the property observed, typically a gene or antibody idenifier (and its label) from a 
                canonical resource such as Ensembl (e.g. ENSG00000275747, IGHV3-79) or 
                Antibody Registry (ABREG:1236456, Purified anti-mouse/rat/human CD27 antibody).

We have a gene or antibody identifier in this case. In the 10X case, this is either Gene Expression or Antibody Capture counts that are being captured. Further in the 10X case, you might have Antibody Capture that is doing protein expression (ABREG:1236456), some sort of Dextramer epitope specificity, or possibly an antibody hash barcode for partitioning data.

When you capture the CellExpression for a cell from a 10X study it could be any of the above (we are working on a study currently that has all of these). Currently in the CellExpression object there is no way to differentiate the type of property that is being counted, other than the rather painful, costly, and not very rigorous mechanism of looking at the property.id and parsing it based on the CURIE prefix and inferring the above type based on the CURIE.

This seems very problematic. I am suggesting we add a property_type to CellExpression with a controlled vocabulary (or at least a strongly suggested) so that we can tell the difference between properties of these types.

I will create a pull request to this effect for discussion.

bcorrie commented 1 year ago

This is a simple extension to add a field. CellExpression data beyond the most simple form is very difficult to use without it.

scharch commented 1 year ago

From call: start with a string now, maybe change to an enum for 2.0

bcorrie commented 10 months ago

This is implemented as a string now: https://github.com/airr-community/airr-standards/blob/c06de0a088c207c517f2c532f389ef5a3e5c67e2/specs/airr-schema-openapi3.yaml#L4718

I am marking this as closed, creating a separate issue for AIRR 2.0 around changing this to an enum.