Closed bcorrie closed 11 months ago
We have a 10X study we are working on, and it has the following types of counts in the features.tsv file.
C0258 FB_hash8 Antibody Capture
C0259 FB_hash9 Antibody Capture
C0260 FB_hash10 Antibody Capture
C0531 FB_dex31 Antibody Capture
C0532 FB_dex32 Antibody Capture
C0533 FB_dex33 Antibody Capture
C0063 FB_CD45RA Antibody Capture
C0148 FB_CCR7 Antibody Capture
C0034 FB_CD3 Antibody Capture
ENSG00000243485 MIR1302-2HG Gene Expression
ENSG00000237613 FAM138A Gene Expression
ENSG00000186092 OR4F5 Gene Expression
This is the first study we have processed with some sort of feature barcoding, and it includes what we consider "normal" in the studies we have loaded, which is 10X "Gene Expression". This study also has feature barcodes for cell phenotype, samples using hashtag feature barcodes, and epitope specificity using dextramers.
The Gene Expression are processed correctly, resulting in:
$ curl -d '{"size":1}' https://repository-staging.ireceptor.org/airr/v1/expression
{"Info":{
[Stuff Deleted]
}, "CellExpression":[
{
"expression_id": "6494c7c178fea0c15161aacb",
"cell_id": "648ced310556ffe55e55beef",
"repertoire_id": "PRJNA744851-B3_VAX2_INF_CELL",
"data_processing_id": "PRJNA744851-B3_VAX2_INF",
"property": {
"label": "PRIM1",
"id": "ENSG:ENSG00000198056"
},
"value": 1,
"adc_annotation_cell_id": "AAAGTAGGTCTGCAAT-5",
"ir_annotation_set_metadata_id_expression": "648cd4655f86d976c84729bd",
"sample_processing_id": "PRJNA744851-B3_VAX2_INF_CELL",
"ir_created_at_expression": "2023-06-22T22:14:24.640405+00:00",
"ir_updated_at_expression": "2023-06-22T22:14:24.640405+00:00"
}]}
Whereas a feature barcode looks like this currently:
$ curl -d '{"filters":{"op":"=","content":{"field":"property.id","value":"C0063"}}}' https://repository-staging.ireceptor.org/airr/v1/expression
[Stuff Deleted]
{
"cell_id": "648cefb5af95bc2a945a4792",
"property": {
"label": "FB_CD45RA",
"id": "C0063"
},
"value": 32,
"ir_annotation_set_metadata_id_expression": "648cd4715f86d976c84729e8",
"adc_annotation_cell_id": "AGTTGGTGTCCTCCAT-5",
"repertoire_id": "PRJNA744851-R12_INF_VAX2_CELL",
"data_processing_id": "PRJNA744851-R12_INF_VAX2",
"sample_processing_id": "PRJNA744851-R12_INF_VAX2_CELL",
"ir_created_at_expression": "2023-06-23T00:02:47.998733+00:00",
"ir_updated_at_expression": "2023-06-23T00:02:47.998733+00:00",
"expression_id": "6494e129a9a6417ffa684350"
}
The problem is there is no way to tell whether a give CellExpression property is a "Gene Expression" property or an "Antibody Capture" property that is being used to determine Cell Phenotype, Cell Specificity, or some other feature...
We do not yet have a mapping to the ABREG registry for these antibodies yet either (so no CURIEs yet in the property.id
), but that is relatively easy we think.
We are of course not sure what exactly makes sense in the enum for property_type
so open to suggestions. The four cases currently listed reflect the three uses of feature barcoding that we have in this study + gene expression. I am sure there are others.
We would suggest having something like:
"property_type": "gene_expression",
"property": {
"label": "PRIM1",
"id": "ENSG:ENSG00000198056"
},
and
"property_type": "surface_protein_expression",
"property": {
"label": "FB_CD45RA",
"id": "C0063"
},
@bussec you are the obvious one to ping on this, but other input is of course welcome.
Not sure if we need to update expression_study_method as well:
@bcorrie I agree with having the field in general, but I have some issues with some of the currently proposed values:
gene_expression
and surface_protein_expression
are fine, but hashtag_expression
is a subtypes of surface_protein_expression
(using another detection technology) and dextramer_expression
would be even more specific that this (in addition "Dextramer" is a trademark, so we should avoid using it).
So we either introduce a property_detection_method
field or change the values to something like fluorescense_based_protein_expression
, dna_tag_based_protein_expression
, etc..
I think a property_dectection_method
field makes sense. In theory ICS or FISH are gene_expression
s that are measured by fluorescence...
I am ok with hashtag_expression
being a separate entry in enum despite it technically being a subset of surface_protein_expression
. Ditto for dextramer_expression
, but to avoid copyright issues, maybe we could combine dextramer barcoding and variants of LIBRASeq as something like antigen_specific_receptor_expression
? (last edited per call)
I think it will be nigh-impossible to enumerate single-cell modalities. The field is evolving pretty rapidly. I think we'd need an other
if we want to go the enum route.
From call:
I now have:
property_detection_method:
type: string
description: >
Keyword describing the detection method used to measure the property value. The following keywords
are recommended if condsidered appropriate but custom methods can be specified: "gene_expression",
"surface_protein_expression", "antigen_specific_receptor_expression", "hastag_expression"
x-airr:
miairr: defined
nullable: true
adc-api-optional: true
@javh @scharch @bussec @kira-neller does this cover it?
@bcorrie I think we decided to have property_type
and property_detection_method
, but currently we only have the latter (although its description still sounds more like property_type
). Am I wrong about this?
@bussec I could not remember and in reading the above from our meeting that was unclear to me. The way I interpreted the above was we wanted the field but didn't like the field name nor the values. So I changed the field name to property_detection_methods
and added the "keywords" to the string field that are in the issue above.
What do the different fields (property_type
and property_detection_method
) represent. I think I need someone to provide some clarity. I can change the field name back to property_type
but I don't know what property_detection_method
is then. I need some guidance 8-) If someone can give some specifics I will add to the spec.
@bcorrie
property_type
describes the biological property that is measures: gene_expression
, surface_protein_expression
or antigen_specific_receptor_expression
are types of properties of a cellproperty_detection_method
describes the way this is measurement is performed, e.g., hashtag_expression
, fluorescence_intensity
or read_count
Happy to help with the terms, I just wasn't sure anymore whether we agreed on one or two keys.
@javh @scharch can you add your thoughts/recollections?
what @bussec said
I'm not sure we need two fields. The method seems like something that would be captured earlier in cell/sample processing.
From the call:
property_type
) for v1.5 and revisit two field concept in v2.0.property_type
field added.Closing this pull request without merge - new branch with pull request: #719
Add a property_type so we can differentiate between the types of properties that exist for a specific Cell
Closes #699