NAL-i5K / tripal_eutils

ncbi loader via the eutils interface
GNU General Public License v3.0
4 stars 3 forks source link

standardizing controlled vocabulary mapping vs allowing flexibility for unkown tags and sites changing mapping #8

Open bradfordcondon opened 5 years ago

bradfordcondon commented 5 years ago

to discuss further with @mpoelchau and @childers

Problem:

NCBI doesnt provide ontology mappings for attributes.Monica has done lots of work going through all the attributes we are interested in. Now we need to assign them to terms. Our broad options are create an ncbi custom ontology or map terms to existing ontologies. I'm always a fan of using existing terms if possible, as that's tripal's approach.... although maybe since we're talking about NCBI we should be communicating with them.

Assuming we go ahead mapping terms, we then have to conisder how this module will associate the xml attributes with cvterms for properties.

Possible implementation: tag terms as associated with ncbi xml tag?

We could use cvtermprop, or just a custom table, to associate xml tags with cvterms. we then let users update that themselves and/or provide an interface to do so.

bradfordcondon commented 5 years ago

Resources that already exist for metadata/ontology mapping for NCBI:

CEDAR ? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977712/

bradfordcondon commented 5 years ago

I drafted this message:

we are building a tool for importing metadata from NCBI and storing it in the community standardized database schema Chado. Doing so requires us to map each attribute to ontology terms: we therefore will be mapping the XML attributes available on the eutils API to ontology terms.

Rather than do these mappings ourselves in isolation, we want to work with the NCBI, perhaps even as part of a broader initiative to set internal metadata standards.

we're broadly interested across all the ncbi databases, but for now focusing on:

Assembly, biosample, bioproject

What we are wondering is, for these metadata tags:

Are they standardized? Are they already mapped to ontologies?

If so, are these mappings publicly available?

If the tags aren’t standardized or mapped to ontologies, can we work together and with the broader community to do so?

Take for example the n50 tag: the OBI has a set of terms describing different n50 types.

The ncbi assembly defines this tag: contign50. Does it, or could it, map to the OBI contig N50 term? https://www.ebi.ac.uk/ols/ontologies/obi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0001941

I think an absolute ideal outcome would be for NCBI to produce and make available tis own ontology with terms such as this one included from other ontologies, so that each attribute downloaded from NCBI could be linked to an existing ontology term found on the EBI ontology lookup service.

bradfordcondon commented 5 years ago

We really have two cases. The first are the more stable XML types. For example, <Organization>. For these, cvterm mappings are generally already taken care of in terms of how they are stored in chado. an organization becomes a chado contact, with a type, which has a term, etc.

The second are the attributes, for example the <Attributes><Attribute type=tissue> leaf</Attribute> tag in biosamples. Each attribute is then composed of a term that needs to be mapped because it wants to go into props. It is these attributes that really need to be robust and flexible.

storage options: