EnvironmentOntology / envo

A community-driven ontology for the representation of environments
http://www.environmentontology.org
Creative Commons Zero v1.0 Universal
134 stars 52 forks source link

ROBOT import routines fail on special characters #521

Closed pbuttigieg closed 7 years ago

pbuttigieg commented 7 years ago

After merging https://github.com/EnvironmentOntology/envo/pull/519 and running make imports/bfo_import.owl, the following errors related to encoding were encountered:

2017-07-04 14:39:49,361 INFO  (OWLGraphWrapperBasic:227) Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/envo/modules/entity_attribute.owl>))) [Axioms: 79 Logical Axioms: 12]
sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘http://purl.obolibrary.org/obo/ENVO_01000750\tMohorovi\304?ić discontinuity\tThe Mohorovi\304?ić discontinuity, usually referred to as the Moho, is the boundary between the Earth's crust and the mantle, indicating a change in composition.’ and ‘http://purl.obolibrary.org/obo/ENVO_00002042\tsurface water\tnull’.
Makefile:114: recipe for target 'imports/seed.tsv' failed
make: *** [imports/seed.tsv] Error 2

This was fixed by removing the special characters from the label (moving them into a synonym), however, there are similar errors with imported labels:

2017-07-04 14:45:23,658 INFO  (OWLGraphWrapperBasic:227) Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/envo/modules/entity_attribute.owl>))) [Axioms: 79 Logical Axioms: 12]
sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘http://purl.obolibrary.org/obo/UBERON_0013485\tcrypt of Lieberkuhn of colon\tAn intestinal crypt that is located in the colon. The colonic crypts of Lieberk\374hn are straight and unbranched and lined largely with goblet cells.’ and ‘http://purl.obolibrary.org/obo/UBERON_0012152\tskeleton of pedal digitopodium\tA subdivision of the pes skeleton consisting of both pedal acropodial skeleton and metatarsal skeleton, but excluding the tarsal skeleton.’.
Makefile:114: recipe for target 'imports/seed.tsv' failed
make: *** [imports/seed.tsv] Error 2
cmungall commented 7 years ago

One thing I noticed that may be causing issues (via a circuitous route) is the old obo-edit days definitions like:

MERGED DEFINITION:
TARGET DEFINITION: An oceanographic feature that involves wind-driven motion of dense, cooler, and usually nutrient-rich water towards the ocean surface, replacing the warmer, usually nutrient-deplete s
urface water.
--------------------
SOURCE DEFINITION: A marine upwelling is a net flow of marine water to the surface of the water column from deeper regions. This is often a result of surface water displacement off continental coasts by wind action. Localised upwellings may also occur along divergent fronts around eddies and along some of the major oceanographic features. Deeper waters often have higher nutrient content; consequently blooms of primary producers, such as planktonic algae, are generally observed around upwelling zones.

newlines in defs can break some things