ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
116 stars 33 forks source link

ISSUE-163 Populate all sweet prefixes throughout ontology suite #165

Closed lewismc closed 4 years ago

lewismc commented 4 years ago

First batch of updates to address #163

This only updates **state*** and sweetAll

lewismc commented 4 years ago

Due to existing prefixes such as pstate, sstate and state I did not find a way to automate this without introducing moire errors. This was done manually!

lewismc commented 4 years ago

I'll update this PR soon. I'm about 50% through! yikes.

lewismc commented 4 years ago

This PR is a beast but it is not complex. It just requires thorough peer review. It's not yet finished. I will indicate here once it is ready for review.

brandonnodnarb commented 4 years ago

I'm adding this here so I don't forget.

I have noticed the past several files have the NS declaration for /rela/ changed, but the actual NS in the file remains 'rela:'.

Also, /relaTime/ has the NS declaration changed to 'sorelt:', but the namespace in the file is 'tsorel:' Will need to re-check the 50 or so files I already 'verified' as I didn't notice that until now.

lewismc commented 4 years ago

@brandonnodnarb what is the current status here? It looks like there are still quite a few files to be addressed... is this a shared understanding?

Thanks for all of the contributions. Really appreciated.

brandonnodnarb commented 4 years ago

@lewismc I believe I've checked all /matr and /state files. These should be good to go.

I've been working in /phen* and should finish tomorrow (late).

I checked /human*, but I'll need to go through those again (hopefully quickly) to make sure I accounted for a few random bits I hadn't checked for initially.

I should be able to get through at least another section/topic by end of week.

brandonnodnarb commented 4 years ago

I think I've sorted automating the remainder. I'll test on a local copy in the morning.

lewismc commented 4 years ago

Hi @brandonnodnarb I've taken a pass through the entire ontology suite twice. My most recent commit reflects a few bugs I found, removal of unused prefixes from some files and updates of every prefix to reflect what we have in the SHACL file. What a task... thankl you very much for tackling a huge portion of this as well. That took just over one week to do. Please let me know if and when you are happy with this. Honestly, I think we would be better of inviting @ESIPFed/semtech to review it as well.

brandonnodnarb commented 4 years ago

latest pushed resolved the following:

anim in matrAnimal.ttl biol in humanKnowledgeDomain.ttl biol in matrAnimal.ttl biol in matrBiomass.ttl com in phenHydro.ttl com in humanEnvirConservation.ttl com in humanEnvirControl.ttl con in realmGeolConstituent.ttl con in humanEnvirControl.ttl dir in reprSpaceDirection.ttl dir in reprSpaceCoordinate.ttl graph in humanDecision.ttl hum in matrEquipment.ttl hum in humanJurisdiction.ttl human in matrWater.ttl human in humanEnvirConservation.ttl human in humanDecision.ttl jur in humanEnvirStandards.ttl jur in humanJurisdiction.ttl land in humanEnvirStandards.ttl land in realmLandOrographic.ttl land in realmLandGlacial.ttl land in realmLandTectonic.ttl land in realmLandCoastal.ttl land in realmLandAeolian.ttl land in realmLandFluvial.ttl land in realmLandform.ttl matr in realmGeolContinental.ttl oper in humanDecision.ttl phen in phenFluidTransport.ttl realm in matrWater.ttl realm in phenEnvirImpact.ttl realm in matrNaturalResource.ttl realm in phenHydro.ttl realm in matrEquipment.ttl realm in humanEnvirConservation.ttl realm in realm.ttl rela: (23 ttl files) repr in repr.ttl repr in humanResearch.ttl repr in humanTechReadiness.ttl res in humanEnvirAssessment.ttl res in humanResearch.ttl res in reprDataProduct.ttl srela2 in reprSpaceCoordinate.ttl time in realmClimateZone.ttl trela: in humanKnowledgeDomain.ttl xten2 in matrWater.ttl

lewismc commented 4 years ago

I’m also +1 on merging this. Let’s wait another 72 hrs minimum before moving ahead. Thanks Brandon

brandonnodnarb commented 4 years ago

@carueda this file may help: SWEET-filename_oldns_uri_newns.txt

carueda commented 4 years ago

Re sweetAll.ttl (just starting with the easiest one for now ;)

Nothing critical, but:

dr-shorthair commented 4 years ago

xml: is used in only two places: in catalog-v001.xml, and for a language tag in sweet.owl. Not needed as a general prefix. xsd: is used in a bunch of places for datatypes. Leave it be. soall: prefix is not needed.

lewismc commented 4 years ago

@dr-shorthair lets address this in a separate ticket. xml: is used in hundreds of places.

dr-shorthair commented 4 years ago

(re xml: all except two are merely the PREFIX: declaration)

carueda commented 4 years ago

With the help of check_isomorphic.sc (which I just added to sweet-tools), I ran a comparison between the files under master branch (under src/ below in my local machine) and corresponding ones from lewismc:ISSUE-163 (under src_branch/ below).

The comparison is based on Jena's isIsomorphicWith method.

As you can see in the report below, a couple of files fail the isIsomorphicWith check, and some others cannot be loaded due to errors. I haven't actually reviewed the affected files per se yet, but hope this helps in the meantime.

$ ./check_isomorphic.sc ../../sweet/src ../../sweet/src_branch

- human.ttl √
- humanAgriculture.ttl √
- humanCommerce.ttl √
- humanDecision.ttl
  ERROR: src_branch/humanDecision.ttl: [line: 227, col: 1 ] Undefined prefix: prop
- humanEnvirAssessment.ttl √
- humanEnvirConservation.ttl
  ERROR: src_branch/humanEnvirConservation.ttl: [line: 104, col: 1 ] Undefined prefix: prop
- humanEnvirControl.ttl √
- humanEnvirStandards.ttl √
- humanJurisdiction.ttl √
- humanKnowledgeDomain.ttl √
- humanResearch.ttl √
- humanTechReadiness.ttl √
- humanTransportation.ttl √
- matr.ttl √
- matrAerosol.ttl √
- matrAnimal.ttl √
- matrBiomass.ttl √
- matrCompound.ttl √
- matrElement.ttl √
- matrElementalMolecule.ttl √
- matrEnergy.ttl √
- matrEquipment.ttl √
- matrFacility.ttl √
- matrIndustrial.ttl √
- matrInstrument.ttl √
- matrIon.ttl √
- matrIsotope.ttl √
- matrMicrobiota.ttl √
- matrMineral.ttl √
- matrNaturalResource.ttl √
- matrOrganicCompound.ttl √
- matrParticle.ttl √
- matrPlant.ttl √
- matrRock.ttl √
- matrRockIgneous.ttl √
- matrSediment.ttl √
- matrWater.ttl √
- phen.ttl √
- phenAtmo.ttl √
- phenAtmoCloud.ttl √
- phenAtmoFog.ttl √
- phenAtmoFront.ttl √
- phenAtmoLightning.ttl √
- phenAtmoPrecipitation.ttl √
- phenAtmoPressure.ttl √
- phenAtmoSky.ttl √
- phenAtmoTransport.ttl √
- phenAtmoWind.ttl √
- phenAtmoWindMesoscale.ttl √
- phenBiol.ttl √
- phenCryo.ttl √
- phenCycle.ttl √
- phenCycleMaterial.ttl √
- phenEcology.ttl √
- phenElecMag.ttl √
- phenEnergy.ttl √
- phenEnvirImpact.ttl √
- phenFluidDynamics.ttl √
- phenFluidInstability.ttl √
- phenFluidTransport.ttl √
- phenGeol.ttl √
- phenGeolFault.ttl √
- phenGeolGeomorphology.ttl √
- phenGeolSeismicity.ttl √
- phenGeolTectonic.ttl √
- phenGeolVolcano.ttl √
- phenHelio.ttl √
- phenHydro.ttl NOT ISOMORPHIC
- phenMixing.ttl √
- phenOcean.ttl √
- phenOceanCoastal.ttl √
- phenOceanDynamics.ttl √
- phenPlanetClimate.ttl √
- phenReaction.ttl √
- phenSolid.ttl √
- phenStar.ttl √
- phenSystem.ttl √
- phenSystemComplexity.ttl √
- phenWave.ttl √
- phenWaveNoise.ttl √
- proc.ttl √
- procChemical.ttl √
- procPhysical.ttl √
- procStateChange.ttl √
- procWave.ttl √
- prop.ttl √
- propBinary.ttl √
- propCapacity.ttl √
- propCategorical.ttl √
- propCharge.ttl √
- propChemical.ttl √
- propConductivity.ttl √
- propCount.ttl √
- propDifference.ttl √
- propDiffusivity.ttl √
- propDimensionlessRatio.ttl √
- propEnergy.ttl √
- propEnergyFlux.ttl √
- propFraction.ttl √
- propFunction.ttl √
- propIndex.ttl √
- propMass.ttl √
- propMassFlux.ttl √
- propOrdinal.ttl √
- propPressure.ttl √
- propQuantity.ttl √
- propRotation.ttl
  ERROR: src_branch/propRotation.ttl: [line: 8, col: 9 ] @prefix or PREFIX requires a prefix (found '[KEYWORD:soproptf]')
- propSpace.ttl √
- propSpaceDirection.ttl √
- propSpaceDistance.ttl √
- propSpaceHeight.ttl √
- propSpaceLocation.ttl √
- propSpaceMultidimensional.ttl √
- propSpaceThickness.ttl √
- propSpeed.ttl √
- propTemperature.ttl √
- propTemperatureGradient.ttl √
- propTime.ttl √
- propTimeFrequency.ttl √
- realm.ttl √
- realmAstroBody.ttl √
- realmAstroHelio.ttl √
- realmAstroStar.ttl √
- realmAtmo.ttl √
- realmAtmoBoundaryLayer.ttl √
- realmAtmoWeather.ttl √
- realmBiolBiome.ttl √
- realmClimateZone.ttl
  ERROR: src_branch/realmClimateZone.ttl: [line: 535, col: 1 ] Undefined prefix: sorept
- realmCryo.ttl √
- realmEarthReference.ttl √
- realmGeol.ttl √
- realmGeolBasin.ttl √
- realmGeolConstituent.ttl
  ERROR: src_branch/realmGeolConstituent.ttl: [line: 27, col: 1 ] Undefined prefix: soreagcons
- realmGeolContinental.ttl √
- realmGeolOceanic.ttl √
- realmGeolOrogen.ttl √
- realmHydro.ttl √
- realmHydroBody.ttl √
- realmLandAeolian.ttl √
- realmLandCoastal.ttl √
- realmLandFluvial.ttl √
- realmLandGlacial.ttl √
- realmLandOrographic.ttl √
- realmLandProtected.ttl √
- realmLandTectonic.ttl √
- realmLandVolcanic.ttl √
- realmLandform.ttl √
- realmOcean.ttl √
- realmOceanFeature.ttl √
- realmOceanFloor.ttl √
- realmRegion.ttl √
- realmSoil.ttl NOT ISOMORPHIC
- rela.ttl √
- relaChemical.ttl √
- relaClimate.ttl √
- relaHuman.ttl √
- relaMath.ttl
  ERROR: src_branch/relaMath.ttl: [line: 35, col: 1 ] Undefined prefix: sorelm
- relaPhysical.ttl √
- relaProvenance.ttl √
- relaSci.ttl √
- relaSpace.ttl √
- relaTime.ttl √
- repr.ttl NOT ISOMORPHIC
- reprDataFormat.ttl √
- reprDataModel.ttl √
- reprDataProduct.ttl √
- reprDataService.ttl √
- reprDataServiceAnalysis.ttl √
- reprDataServiceGeospatial.ttl √
- reprDataServiceReduction.ttl √
- reprDataServiceValidation.ttl √
- reprMath.ttl √
- reprMathFunction.ttl √
- reprMathFunctionOrthogonal.ttl √
- reprMathGraph.ttl √
- reprMathOperation.ttl √
- reprMathSolution.ttl √
- reprMathStatistics.ttl √
- reprSciComponent.ttl √
- reprSciFunction.ttl √
- reprSciLaw.ttl √
- reprSciMethodology.ttl √
- reprSciModel.ttl √
- reprSciProvenance.ttl √
- reprSciUnits.ttl √
- reprSpace.ttl √
- reprSpaceCoordinate.ttl √
- reprSpaceDirection.ttl NOT ISOMORPHIC
- reprSpaceGeometry.ttl √
- reprSpaceGeometry3D.ttl √
- reprSpaceReferenceSystem.ttl √
- reprTime.ttl √
- reprTimeDay.ttl √
- reprTimeSeason.ttl √
- state.ttl √
- stateBiological.ttl √
- stateChemical.ttl √
- stateDataProcessing.ttl √
- stateEnergyFlux.ttl √
- stateFluid.ttl √
- stateOrdinal.ttl √
- statePhysical.ttl √
- stateRealm.ttl √
- stateRole.ttl √
- stateRoleBiological.ttl √
- stateRoleChemical.ttl √
- stateRoleGeographic.ttl √
- stateRoleImpact.ttl √
- stateRoleRepresentative.ttl √
- stateRoleTrust.ttl √
- stateSolid.ttl √
- stateSpace.ttl √
- stateSpaceConfiguration.ttl √
- stateSpaceScale.ttl √
- stateSpectralBand.ttl √
- stateSpectralLine.ttl √
- stateStorm.ttl √
- stateSystem.ttl √
- stateThermodynamic.ttl √
- stateTime.ttl √
- stateTimeCycle.ttl √
- stateTimeFrequency.ttl √
- stateTimeGeologic.ttl √
- stateVisibility.ttl √
- sweetAll.ttl √
215 isomorphic files out of 225
lewismc commented 4 years ago

This is beautiful @carueda I'll go in an make the necessary changes and push an update.

lewismc commented 4 years ago

Current issues reduced to 3

lmcgibbn@MT-207576 ~/Downloads/sweet-tools/sc(master) $ ./check_isomorphic.sc ../../sweet/src/ ../../sweet_orig/src/
...
- phenHydro.ttl NOT ISOMORPHIC
- realmSoil.ttl NOT ISOMORPHIC
- reprSpaceDirection.ttl NOT ISOMORPHIC
...
222 isomorphic files out of 225

I tried to manually check phenHydro.ttl and realmSoil.ttl with no luck. It turns out that reprSpaceDirection.ttl has had some unused prefixes removed which seems to have screwed with things as well. Not long to go now.

carueda commented 4 years ago

@lewismc Looks like those are false negatives .. EDIT: scratch that! I had a typo in my script!

I just included a "diff" report based on the n-triples version of the model for each failed isomorphic check.

$ ./check_isomorphic.sc ../../sweet/src ../../sweet/src_branch | grep 'NOT ISO'
- phenHydro.ttl NOT ISOMORPHIC, see phenHydro.ttl.diff
- realmSoil.ttl NOT ISOMORPHIC, see realmSoil.ttl.diff
- reprSpaceDirection.ttl NOT ISOMORPHIC, see reprSpaceDirection.ttl.diff

Each .diff will show the triples that are not common between the compared models.

Example, phenHydro.ttl.diff:

2 triples in 1st model but not in the 2nd:
  <http://sweetontology.net/realmHydro/Aquifer> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
  <http://sweetontology.net/realmHydro/UndergroundWater> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .

2 triples in 2nd model but not in the 1st:
  <http://sweetontology.net/realm/Aquifer> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
  <http://sweetontology.net/realm/UndergroundWater> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
carueda commented 4 years ago

Re realmSoil.ttl

it seems that the diff is because of the blank nodes (which is not a surprise).

However, for triples not involving blank nodes I can only see this diff:

<http://sweetontology.net/realmSoil/SoilOrder> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://sweetontology.net/propCategorical/Classification> .
<http://sweetontology.net/realmSoil/SoilOrder> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://sweetontology.net/realmSoil/Classification> .
carueda commented 4 years ago

Re phenHydro.ttl: (again, ignoring blank nodes)

<http://sweetontology.net/realmHydro/Aquifer> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://sweetontology.net/realm/Aquifer> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .

and

<http://sweetontology.net/realmHydro/UndergroundWater> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
<http://sweetontology.net/realm/UndergroundWater> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
carueda commented 4 years ago

So, reprSpaceDirection.ttl is already fixed and pushed to this PR.

EDIT: I mean, in terms of both versions being isomorphic (more concretely, identical in terms of the n-triples representation -- I haven't looked at the prefixes themselves.)

lewismc commented 4 years ago

Hi @carueda please pull most recent commit locally then re-run ./check_isomorphic.sc and post your diff result for realmSoil.ttl please. Thanks

carueda commented 4 years ago

realmSoil.ttl.diff.txt phenHydro.ttl.diff.txt

carueda commented 4 years ago
<http://sweetontology.net/realmSoil/SoilOrder> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://sweetontology.net/propCategorical/Classification> .
<http://sweetontology.net/realmSoil/SoilOrder> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://sweetontology.net/realmSoil/Classification> .
lewismc commented 4 years ago

OK I fixed the final issue. It looks like it was down to another grep replacement gone wrong... but that was to expected.

I'm +1 on merging.

lewismc commented 4 years ago

Does anyone have further comments on this PR? We've not had any further peer review in around a week. I would like to get working on #169 and this is basically blocking that now.

lewismc commented 4 years ago

Thanks everyone for the review and contributions. This was pretty major!