FoodOntology / foodon

The core repository for the FOODON food ontology project. This holds the key classes of the ontology; larger files and the results of text-mining projects will be stored in other repos.
Creative Commons Attribution 4.0 International
176 stars 36 forks source link

Langual to FOODON export #17

Closed Public-Health-Bioinformatics closed 7 years ago

Public-Health-Bioinformatics commented 8 years ago

So I'm working on the http://langual.org to Foodon import now. Trying to do a script that will be sensitive to changes in Langual's annual XML release file. Gives me an appreciation for the kind of challenges WikiData could face in trying automatic synchronization to GO, etc.

So I intend to import the following Langual facets: B: FOOD SOURCE [B1564] F. EXTENT OF HEAT TREATMENT [F0011] G. COOKING METHOD [G0002] J. PRESERVATION METHOD [J0107] K. PACKING MEDIUM [K0020] M. CONTAINER OR WRAPPING [M0100]

Now that leaves a number of them out but they can come in later. Facet C. "Part of Plant or Animal" would be attractive to our food sampling users I think but I'd like lots of terms in there to be mapped to a proper anatomy ontology, so that's a mini project in itself to try later.

F, G, J, K and M are all relatively straightforward hierarchies of terms that I think would have a very small # of equivalent term ids already existing within the OBI family of ontologies, so would mostly pop into FOODON without hassle.

B: FOOD SOURCE items: Plant, animal, fungi, bacteria - they'll be getting mapped to their equivalent NCBI entities, if any; there are over 2,300 terms that have an ITIS taxon code associated with them. I intend for the import script to create an Ontofox import file specification of all the appropriate NCBITaxon_123456 items, including their NCBI has_exact_synonym and has_related_synonym, and all parent nodes.

The script would create a second OWL ontology file that would augment the NCBI imported ontology with all the Langual synonyms and hasDbXref codes, and would provide a separate "is a" Langual-oriented hierarchy for them that groups food source items in human-centric ways (e.g. freshwater fish vs saltwater fish) that would have FOODON_1234567 ontology ids. All this subject to iterative design after delivery!

Now FOOD SOURCE also lists many food additives.
http://www.langual.org/langual_thesaurus.asp?termid=B2972&haschildren=True&owner=B1041&openstr=B3412_00000_B1564_B1041 Shall I go ahead and try to add them as CHEBI import items? Does anyone know of a bulk lookup system for CHEBI such that I could throw at it a long list of chemicals and get back most likely match for each one?

ALSO, Robert / Chris / Pier, should I be targeting a different service than Ontofox for creation of an import file of terms? I know how to make an Ontofox import text file (e.g. http://ontofox.hegroup.org/format.txt) but haven't used ontofox as a command line service before.

leechuck commented 8 years ago

Have you looked at https://www.ebi.ac.uk/chebi/webServices.do ?

Public-Health-Bioinformatics commented 8 years ago

Ok, that's a start. Have you used it before - and particularly do you guys have chebi api code examples in python, or mainly some other language? I'm doing everything server side in python these days.

leechuck commented 8 years ago

I have not used the API myself, will ask my group if they have sample code. There is also https://github.com/libChEBI/libChEBIpy and some code at https://pythonhosted.org/bioservices/_modules/bioservices/chebi.html

Public-Health-Bioinformatics commented 8 years ago

Sweet, thanks! Will try it out in the next week.

Public-Health-Bioinformatics commented 8 years ago

Should the FOODON import files be in OBO format, or is the OWL format preserving info any better? (import files will probably be named langual_chebi.obo, langual.obo, langual_ncbitaxon.obo)

leechuck commented 8 years ago

OWL format will preserve all the information, OBO format is a subset of OWL and some information may be lost. Using the OBO/OWL tools (by @cmungall ), it should be quite easy to generate OBO files from the OWL files, the other way around may lose some information.

Public-Health-Bioinformatics commented 8 years ago

Ok. I'm going through an OBO file spec at ftp://ftp.geneontology.org/pub/go/www/GO.format.obo-1_4.shtml#S.2.2.3 which seems to have all I'd need. Given that Langual doesn't actually have any logic other than an is_a hierarchy, I will try avoiding the OWL format for now.

leechuck commented 8 years ago

That should be fine. Using OWLAPI, the OBO file can then be converted to OWL format if required.

cmungall commented 8 years ago

@Public-Health-Bioinformatics - the actual spec is here: http://owlcollab.github.io/oboformat/doc/obo-syntax.html

But you shouldn't need this. Even if it's just an is_a hierarchy, it may be simpler to general owl/ttl as a first pass. Whatever you use, it should interconvert just fine between obo and other owl syntaxes

cmungall commented 8 years ago

C would presumably go to a mixture of PO, Uberon (and a single FAO class?)

Public-Health-Bioinformatics commented 8 years ago

Well, I confess its a bit more than an is_a hierarchy. I am capturing varioius things from what is originally XML: `

B1819 BLACK CRAPPIE B1409 <SCIFAM>Centrarchidae [ITIS 168093] <SCINAM>Pomoxis nigromaculatus (Lesueur in Cuvier and Valenciennes, 1829) [ITIS 168167] <SCINAM>Pomoxis nigromaculatus (Lesueur, 1829) [Fishbase 2004 3388] <SCINAM>Pomoxis nigromaculatus (Lesueur, 1829) [FAO ASFIS PXG] <SCINAM>Pomoxis nigromaculatus (Lesueur, 1829) [CEC 1993 597] <SCINAM>Pomoxis nigromaculatus [2010 FDA Seafood List] <SCINAM>Pomoxis nigromaculatus (Lesueur, 1829) <DICTION> The black crappie, Pomoxis nigromaculatus (Lesueur, 1829), is very similar to P. annularis in size, shape, and habits, except that it is darker, with a pattern of black spots. The black crappie has 7-8 spines on its dorsal fin. The number of spines on the dorsal fin, is occasionally the only way to differeniate between a juvenile black crappie and a white crappie. The black crappie tends to prefer clearer water than the white crappie does. Its native range is uncertain, since it has been so widely transplanted, but it is presumed to be similar to the white crappie's. The black crappie is also known as the strawberry bass or Oswego bass. (http://en.wikipedia.org/wiki/White_crappie) crappie, black pomoxis nigromaculatus True 2011-09-01

`

Into an intermediary json database that lets us control what elements should get in on an ongoing basis - so a periodic import can occur: "FOODON_3411409": { "status": "import", "definition": { "import": true, "changed": true, "locked": false, "value": "The species of this genus are known as crappies and are extremely popular game fish. The genus has two species the white and black crappie. Crappie of both species are sometimes referred to as papermouths, calico bass, and strawberry bass. Both species of crappie feed on minnows as adults. Both species spawn in the early spring when the water temperature nears 64 to 68 degrees. Crappie create a nest in fine silt or gravel, and the nests are often congregated in very high densities in shallow waters. " }, "taxonomy": [ "ITIS:168093" ], "label": { "import": true, "changed": false, "locked": false, "value": "CRAPPIE" }, "is_a": [], "parent_id": { "import": true, "changed": false, "locked": false, "value": "B1818" }, "database_id": "B1409", "active": { "import": true, "changed": false, "locked": false, "value": "True" }, "ontology_id": "FOODON_3411409", "definition source": "http://en.wikipedia.org/wiki/White_crappie" },

and this leads to an OBO or OWL output file (WORK IN PROGRESS): ... [Term] id: FOODON:3411409 name: CRAPPIE is_a: FOODON:3411818 def: "The species of this genus are known as crappies and are extremely popular game fish. The genus has two species the white and black crappie. Crappie of both species are sometimes referred to as papermouths, calico bass, and strawberry bass. Both species of crappie feed on minnows as adults. Both species spawn in the early spring when the water temperature nears 64 to 68 degrees. Crappie create a nest in fine silt or gravel, and the nests are often congregated in very high densities in shallow waters. " [] ... etc... ... entries will be cross referenced to NCBI taxon etc.; synonyms and replaced_by ID/is_archaic flags will be set.

I'll check out ttl too.

Public-Health-Bioinformatics commented 8 years ago

Chris, about Facet C, yes that sounds right.

Public-Health-Bioinformatics commented 8 years ago

Here are a few more preliminary OBO import file entries. Question: are scientific name (latin) synonyms considered "english"? Should I be making the synonym "EXACT" type the default? (There is a way to override both synonym type and language in import spec for each term import).

[Term] id: FOODON:3414530 name: "INDIAN GOOSEBERRY"@en is_a: FOODON:3413387 ! TROPICAL OR SUBTROPICAL FRUIT - EDIBLE PEEL def: "Phyllanthus emblica (syn. Emblica officinalis), the Indian gooseberry ... is a deciduous tree of the family Phyllanthaceae. It is known for its edible fruit of the same name. [WIKIPEDIA:Phyllanthus_emblica] synonym: "aonla"@en EXACT synonym: "emblic"@en EXACT synonym: "emblic myrobalan"@en EXACT synonym: "indian-gooseberry"@en EXACT synonym: "phyllanthus emblica"@en EXACT xref: Langual:B4530 xref: ITIS:845434

[Term] id: FOODON:3414531 name: "ATLANTIC BOBTAIL"@en is_a: FOODON:3414532 ! BOBTAIL SQUID def: "Sepiola atlantica, also known as the Atlantic bobtail, is a species of bobtail squid native to the northeastern Atlantic Ocean (65ºN to 35ºN), from Iceland, the Faroe Islands and western Norway to the Moroccan coast. There is a single record of this species from the Mediterranean Sea. [WIKIPEDIA:Sepiola_atlantica] synonym: "little cuttle"@en EXACT synonym: "sepiola atlantica"@en EXACT xref: Langual:B4531 xref: ITIS:82335

cmungall commented 8 years ago

If you want to do languages, go straight to rdf/owl

The syntax you have below is incorrect (try running it through robot or owltools)

We sometimes do this

synonym: "aonla@en" EXACT

but it's convention, needs to be converted post-hoc

Just go straight to a format with clear language semantics. You can still use the OIO vocabulary of course.

For uberon we have a LATIN synonym type (note: type, not scope)

https://github.com/obophenotype/uberon/wiki/Using-uberon-for-text-mining

But this is still from an english speaking POV. I guess the latin is really a formal name that is intended to span languages

On 25 Aug 2016, at 15:40, Damion Dooley wrote:

Here are a few more preliminary OBO import file entries. Question: are scientific name (latin) synonyms considered "english"? Should I be making the synonym "EXACT" type the default? (There is a way to override both synonym type and language in import spec for each term import).

[Term] id: FOODON:3414530 name: "INDIAN GOOSEBERRY"@en is_a: FOODON:3413387 ! TROPICAL OR SUBTROPICAL FRUIT - EDIBLE PEEL def: "Phyllanthus emblica (syn. Emblica officinalis), the Indian gooseberry ... is a deciduous tree of the family Phyllanthaceae. It is known for its edible fruit of the same name. [WIKIPEDIA:Phyllanthus_emblica] synonym: "aonla"@en EXACT synonym: "emblic"@en EXACT synonym: "emblic myrobalan"@en EXACT synonym: "indian-gooseberry"@en EXACT synonym: "phyllanthus emblica"@en EXACT xref: Langual:B4530 xref: ITIS:845434

[Term] id: FOODON:3414531 name: "ATLANTIC BOBTAIL"@en is_a: FOODON:3414532 ! BOBTAIL SQUID def: "Sepiola atlantica, also known as the Atlantic bobtail, is a species of bobtail squid native to the northeastern Atlantic Ocean (65ºN to 35ºN), from Iceland, the Faroe Islands and western Norway to the Moroccan coast. There is a single record of this species from the Mediterranean Sea. [WIKIPEDIA:Sepiola_atlantica] synonym: "little cuttle"@en EXACT synonym: "sepiola atlantica"@en EXACT xref: Langual:B4531 xref: ITIS:82335

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/FoodOntology/foodon/issues/17#issuecomment-242566913

Public-Health-Bioinformatics commented 8 years ago

Ok, good, I will produce OWL format then.

An interesting issue has come up: Many food source items have links not just to taxonomic ITIS codes at species rank level, but to higher level family rank codes.

Now, on the topic of human colloquial categorization vs. formal taxonomic categorization:

Some Langual terms are more general and only have higher level codes: The "Crayfish or crawfish" (which we would rename to just "Crayfish", and offer crawfish synonym) is a useful general category for epidemiologists since for example people can remember only that level of detail in a food interview. Wikipedia mentions that "taxonomically, they are members of the superfamilies Astacoidea and Parastacoidea". Under it are (European astacidae family) and American (cambaridae family) crayfish family categories with specific species items listed as subclasses (e.g. Noble crayfish, Astacus astacus). Not mentioned is Wikipedia's third family Parastacidae, so example of incomplete data.

Another example: "Shellfish and crustaceans" is an arbitrary conjunction taxonomically; it has crustacea subphylum on the one hand, and on the other, a "shellfish" placeholder which could be merged with a subclass called mollusks (tied to the general phylum mollusca). Again, for food industry classification, and for epidemiologists this is an understandable grouping.

First: does everyone see the utility of having food source items that have a kind of upper and lower bound on their taxonomic classification (rank) - with corresponding ITIS / NCBI taxonomic codes? As far as I can tell with existing ontologies this use case isn't well covered. To recap, often a food source item is cast as being associated with (in a membership way) a higher level taxonomic rank; and any subordinate rank items given in Langual are conveyed as taxonomic examples of the given term, often more than one example at hand.

To capture the semantics of this is to have something like:

Crayfish "is member of" NCBITaxon_6724 (Astacoidia) OR NCBITaxon_29961 (Parastacoidea)
Crayfish "imported from" langual.org

American Crayfish family "subclass of" Crayfish
American Crayfish family "is member of" NCBITaxon_6725 (Cambaridae)
American Crayfish family "imported from" langual.org

FLORIDA CRAYFISH "subclass of" American Crayfish family
FLORIDA CRAYFISH "imported from" langual.org
FLORIDA CRAYFISH "is member of" NCBITaxon_6725 (Cambaridae)   (could inherit this)
FLORIDA CRAYFISH "has example taxonomy" NCBITaxon_643729 (Procambarus alleni)

Do we need to introduce the "has example taxonomy" relation?

The "is member of" is perhaps too strong? Should it be "is taxonomic member of" ?

Crayfish could inherit a "plays food role" with respect to other creatures, but shall I put off the discussion of what could/should be done there until later?

We could import taxonomic rank data directly from NCBITaxon if desired; but I don't see use of including that in FOODON at moment.

I should note that Langual's particular examples of a food item in question aren't comprehensive. I think food safety issues potentially arise if users assume for a given item and its taxonomic lower rank, that all sibling or subordinate relatives are edible for example. If a nut family is given, potentially not all of the genus or species level varieties are edible. So we'll have to have a proviso for this.

On another topic, Langual has two terms for food source, "freshwater fish" and "marine fish" which are just used as keyword descriptors, i.e. they don't have any subclasses. The terms suggest that all animals and plants could be categorized by a "has environment" relation to an ENVO biome. Langual conveys this only through the plain text definitions; but perhaps other sources like EOL could be employed later to layer this information on for full food-biome analysis.

Onward,... and feedback appreciated...

d.

cmungall commented 8 years ago

On 29 Aug 2016, at 12:05, Damion Dooley wrote:

Ok, good, I will produce OWL format then. 

Depending on how you're generating things you may want to look at a generator framework - e.g. ROBOT templates or dead-simple-desig-patterns

Public-Health-Bioinformatics commented 8 years ago

Chris, I've taken a closer look at how (the FOODON import of) UBERON makes taxonomic references, and think a similar approach could work?

synonym: "organa sensuum" EXACT LATIN [FMA:75259, FMA:TA]
synonym: "pharynx" BROAD SENSU [FMA:46688]
synonym: "gastrodermis" EXACT SENSU [BGEE:ANN, NCBITaxon:6073]

So an example taxonomy spec could look like (pseudocode):

FLORIDA CRAYFISH 
  synonym: "Procambarus alleni" NARROW LATIN [NCBITaxon:643729]

I checked out ROBOT, and dead-simple-design-patterns, thanks - I can see utility of that work and will look into that more; What looked appealing for my script to use (at https://github.com/Public-Health-Bioinformatics/foodon-langual , save_ontology() function) was this P4 item you mentioned in "Using UBERON for text mining":

AnnotationAssertion(Annotation(http://www.geneontology.org/formats/oboInOwl#hasDbXref "NIF_GrossAnatomy:birnlex_2703"^^xsd:string) http://www.geneontology.org/formats/oboInOwl#hasExactSynonym http://purl.obolibrary.org/obo/UBERON_0001997 "olfactory membrane"^^xsd:string)

What tool / library converts that function call into OWL file content?

footnote: I see now that AnnotationAssertion(...) etc. is the OWL Functional-Style Syntax.

mateolan commented 8 years ago

Stoked you are making such rapid progress...yet your struggles with LanguaL ontologization raise a several interesting issues (a few of which I alluded to in my talk)--but which now I think need bearing out, as they will continue to raise their heads:

  1. We need a way to describe how/where/when the food was grown/produced/procured.
    • It is not enough to say it is a freshwater fish or marine fish--salmon is an example of an organism that requires a more robust characterization that coincides with season and life-stage --this will make a difference as to its composition..
    • Another example is Coturnism, an illness known since ancient times, that results in rhabdomyolysis (muscle cell breakdown) from eating quail that have fed on poisonous plants. The quail are not noticeably different by either taste or visual senses, but Sinai hunters familiar with their migration patterns generally know WHEN the quail are safe.
      1. We need a way to describe parts of animals consumed. These don't usually match directly with anatomical parts, and butchering techniques (examples one http://www.thirdculturemama.com/french-vs-american-butchery/#.V8YkuGhhma4, two http://eatingchile.blogspot.com/2009/12/eating-chilean-beef.html, three http://www.noordinaryhomestead.com/butcher-shop-cheet-sheet-auf-deutsch/) differ widely across cultures.
      2. Do we have a way for talking about portions of muscles? Whether the muscle has been pulled/ripped, cut across the grain, or has a portion of bone left in?
      3. Modelling the separation of things like whey protein from milk would be another example.
      4. In addition to the above examples, as you mention, we need to be able to model food names to species/higher taxonomic orders. This gets tricky because often the same names for foods refer to different species--depending on locale and culture. This means that these are not just strict translation issues, but ethnic and cultural colloquialisms that should be modeled in from the beginning. Your coverage of crayfish is a good example--there is decent synonymy in LanguaL for crayfish<=>crawfish, but only one reference for "crawdad"--which is the word I used when I was growing up. Even more puzzling is the use of the word "Shrimp" as food...admittedly LanguaL does a decent job trying to parse this out--but if one only has "shrimp" in a recipe, it would be quite difficult to determine what is actually meant.
  2. Yet another issue to think about (last one on this thread, promise) revolves around is the notion that we need to define foods differently depending on the context which they occur. A simple example of this is the tomato https://en.wikipedia.org/wiki/Nix_v._Hedden. Botanically, it is a fruit, yet for legal purposes in the United States and elsewhere, it is considered a vegetable, legally. The reason for this is that fruits--those sweet fruiting bodies from a plant--were generally considered a delicacy, and taxed at a higher rate. tomatoes were considered to be eaten with the meal, and therefor not taxed as heavily.

While I am seriously delighted that you are progressing with ontologization of LanguaL, I think the issues you are uncovering, as well as those mentioned above, highlight the need for us to consider that this is only a very primitive step.If we can capture some of these points early on in the modeling process, I think it will save us a lot of work and headache down the road.

Perhaps we could have a conference call in the relative near future?

~Matthew

On Aug 29, 2016 12:05 PM, "Damion Dooley" notifications@github.com wrote:

Ok, good, I will produce OWL format then.

An interesting issue has come up: Many food source items have links not just to taxonomic ITIS codes at species rank level, but to higher level family rank codes.

Now, on the topic of human colloquial categorization vs. formal taxonomic categorization:

Some Langual terms are more general and only have higher level codes: The "Crayfish or crawfish" (which we would rename to just "Crayfish", and offer crawfish synonym) is a useful general category for epidemiologists since for example people can remember only that level of detail in a food interview. Wikipedia mentions that "taxonomically, they are members of the superfamilies Astacoidea and Parastacoidea". Under it are (European astacidae family) and American (cambaridae family) crayfish family categories with specific species items listed as subclasses (e.g. Noble crayfish, Astacus astacus). Not mentioned is Wikipedia's third family Parastacidae, so example of incomplete data.

Another example: "Shellfish and crustaceans" is an arbitrary conjunction taxonomically; it has crustacea subphylum on the one hand, and on the other, a "shellfish" placeholder which could be merged with a subclass called mollusks (tied to the general phylum mollusca). Again, for food industry classification, and for epidemiologists this is an understandable grouping.

First: does everyone see the utility of having food source items that have a kind of upper and lower bound on their taxonomic classification (rank) - with corresponding ITIS / NCBI taxonomic codes? As far as I can tell with existing ontologies this use case isn't well covered. To recap, often a food source item is cast as being associated with (in a membership way) a higher level taxonomic rank; and any subordinate rank items given in Langual are conveyed as taxonomic examples of the given term, often more than one example at hand.

To capture the semantics of this is to have something like:

Crayfish "is member of" NCBITaxon_6724 (Astacoidia) OR NCBITaxon_29961 (Parastacoidea) Crayfish "imported from" langual.org

American Crayfish family "subclass of" Crayfish American Crayfish family "is member of" NCBITaxon_6725 (Cambaridae) American Crayfish family "imported from" langual.org

FLORIDA CRAYFISH "subclass of" American Crayfish family FLORIDA CRAYFISH "imported from" langual.org FLORIDA CRAYFISH "is member of" NCBITaxon_6725 (Cambaridae) (could inherit this) FLORIDA CRAYFISH "has example taxonomy" NCBITaxon_643729 (Procambarus alleni)

Do we need to introduce the "has example taxonomy" relation?

The "is member of" is perhaps too strong? Should it be "is taxonomic member of" ?

Crayfish could inherit a "plays food role" with respect to other creatures, but shall I put off the discussion of what could/should be done there until later?

We can avoid import taxonomic rank data directly from NCBITaxon if desired; but I don't see use of including that in FOODON at moment.

I should note that Langual's particular examples of a food item in question aren't comprehensive. I think food safety issues potentially arise if users assume for a given item and its taxonomic lower rank, that all sibling or subordinate relatives are edible for example. If a nut family is given, potentially not all of the genus or species level varieties are edible. So we'll have to have a proviso for this.

On another topic, Langual has two terms for food source, "freshwater fish" and "marine fish" which are just used as keyword descriptors, i.e. they don't have any subclasses. The terms suggest that all animals and plants could be categorized by a "has environment" relation to an ENVO biome. Langual conveys this only through the plain text definitions; but perhaps other sources like EOL could be employed later to layer this information on for full food-biome analysis.

Onward,... and feedback appreciated...

d.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FoodOntology/foodon/issues/17#issuecomment-243222738, or mute the thread https://github.com/notifications/unsubscribe-auth/ABmTI2Ssg-WurmhKGtVVPuxKtJK5Aok7ks5qky2TgaJpZM4JnJw9 .

Public-Health-Bioinformatics commented 8 years ago

Great to get your feedback, and to have you in the fray. I'd like to create an issues thread for each of those substantial points, many of which have to do with making a globally pertinent food ontology, which I'm totally dedicated to. I'll add ideas about how to frame some of those items and I'm sure others will have example needs and hopefully solutions. In terms of this Langual import directly, I think I could put a hold on importing a few facets (e.g. E Physical State, Shape or Form) that I know we would want to reorganize, until we have a chance to discuss what the end product should be like.

I welcome a teleconference any time that is convenient for others. I'll send out an email about that; for me Sept 8 or 9 early morning could work.

leechuck commented 7 years ago
  1. We need a way to describe parts of animals consumed. These don't usually match directly with anatomical parts, and butchering techniques (examples one http://www.thirdculturemama.com/french-vs-american-butchery/#.V8YkuGhhma4, two http://eatingchile.blogspot.com/2009/12/eating-chilean-beef.html, three http://www.noordinaryhomestead.com/butcher-shop-cheet-sheet-auf-deutsch/) differ widely across cultures. - Do we have a way for talking about portions of muscles? Whether the muscle has been pulled/ripped, cut across the grain, or has a portion of bone left in? - Modelling the separation of things like whey protein from milk would be another example.

Anatomical parts could be describes using UBERON, by combining multiple classes (e.g., a piece of meat that has some muscle as part and has some bone as part and does not have blood as part and ...). We might think about a pre-composed branch of FOODON, based on UBERON, in which common butchering techniques are described and how they relate to UBERON classes. Portion of muscle can be described by parthood relations, UBERON classes, and, if required, PATO or biospatial ontology (for lateral, right, left, etc...).

  1. In addition to the above examples, as you mention, we need to be able to model food names to species/higher taxonomic orders. This gets tricky because often the same names for foods refer to different species--depending on locale and culture. This means that these are not just strict translation issues, but ethnic and cultural colloquialisms that should be modeled in from the beginning. Your coverage of crayfish is a good example--there is decent synonymy in LanguaL for crayfish<=>crawfish, but only one reference for "crawdad"--which is the word I used when I was growing up. Even more puzzling is the use of the word "Shrimp" as food...admittedly LanguaL does a decent job trying to parse this out--but if one only has "shrimp" in a recipe, it would be quite difficult to determine what is actually meant.

Maybe the following might work: define "shrimp (sensu X)" classes for each use of shrimp. Then define a general "shrimp" class as the union of all the individual classes (or even as a superclass of the union, so to allow the union to be incomplete).

R

Public-Health-Bioinformatics commented 7 years ago

I've moved issue #3 over to https://github.com/FoodOntology/foodon/issues/20, ok?

Public-Health-Bioinformatics commented 7 years ago

Update: LanguaL to FoodOn import is complete.

We'll use [term name (sensu ...)] pattern to differentiate homonyms that have cultural/regional differences in use.

We will want to augment FoodOn using UBERON term class combinations for cuts; I've done a light pass introducing UBERON anatomy parts in the FoodOn "part of plant or animal" class, but this should be extended. I'm going to have to leave that as a separate task for those who need it for their culinary or food inspection work. I didn't do any work on the plant side of things here.