ga4gh / ga4gh-schemas

Models and APIs for Genomic data. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
214 stars 114 forks source link

Biosample sample characteristics #711

Closed david4096 closed 7 years ago

david4096 commented 8 years ago

Reopened from #710 to remove extraneous commits. @mbaudis if you're happy with it you can close the other.

This PR addresses the need for a different structure and naming of the Biosample.disease attribute:

there has been consensus in the metadata task team that an attribute name of disease is misleading in the context of Biosample and should be reserved for the individual object level a Biosample can use a number of ontologies, e.g. for patho-histology, anatomic location, tissue type ...; see also the discussion at #707 The renaming to samplecharacteristics is in line with the use e.g. at GEO; however, the sample prefix may not be strictly necessary. Alternatives welcome.

david4096 commented 8 years ago

Copying response from #710:

This makes sense: to have a tag-bag field that strictly represents Ontology Terms, although the semantic context of those terms can be lost if no name is provided.

Please also consider the approach of allowing tagging via Ontology Term (and others) in a generic attributes field by upgrading the info field. This would allow a data curator to define named tag bags on a biosample with some more context without adding new named fields to the message. This replaces the info field and both approaches are not exclusive. PR for this feature here.

I'm +1 for this approach as it solves the immediate issue of specifying tissue type for TCGA data.

mdmiller53 commented 7 years ago

+1 for the merge

sarahhunt commented 7 years ago

+1 Looks good to me.

david4096 commented 7 years ago

Seconding my +1. When loading the TCGA biosamples it appears some are labeled with more than one disease, making the singly valued disease field not especially helpful!

mbaudis commented 7 years ago

So merge depending on the integration team. @david4096 @kozbo?! (If no objections to the attribute's name).

mcourtot commented 7 years ago

Hi @david4096,

We talked further with @mbaudis and would like to propose replacing the _samplecharacteristics attribute with the characteristics object, with structure:

characteristics: [
      {
      description: “squamous cell carcinoma, base of tongue, stage 2”,
      type: phenotype (could be organism, disease...)
      repeated OntologyTerm ontologyTerms: [
          {
        ontologyId:  “http://purl.obolibrary.org/obo/DOID_0050865”,
        term:  “tongue squamous cell carcinoma”,
        },
        {
        ontologyId: “http://purl.obolibrary.org/obo/UBERON_0006919”,
        term:  “tongue squamous epithelium”,
        },
        {
        ontologyId:  “http://purl.obolibrary.org/obo/UBERON_0010033”
        term:  “posterior part of tongue”,
        },
        ],
      }
]

where each ontologyTerm is simplified in the above, but would in fact be the OntologyTerm structure as we agreed on at https://github.com/ga4gh/schemas/pull/694/files (and include version and source).

david4096 commented 7 years ago

Neat! That's certainly a more flexible way of describing characteristics! Could you take a look at https://github.com/ga4gh/schemas/pull/700 ? I believe it is attempting to perform a similar facility and would apply across the API. The additions you've made from what I can tell are restricting the characteristics to ontology terms, and providing a description and controlled vocabulary for the type.

mbaudis commented 7 years ago

Following the discussions at Vancouver: Closing this in favour https://github.com/ga4gh/schemas/pull/725.