Closed amoeba closed 3 months ago
@samanthacsik Any update on this initial vocabulary work?
Currently focused on defining ASL terms + some anatomical structures + SASAP's faceted search terms. You can find those in the Google Sheet below:
I'm just about ready to begin moving these over to Protege/formally semanticising this initial batch of terms. I'll coordinate with @mpsaloha to make sure he has a chance to review definitions/the ontology design pattern.
That's great news @samanthacsik , thanks.
I'm just about ready to begin moving these over to Protege/formally semanticising this initial batch of terms.
When you do that, I'm curious how/whether you might automate that process and if we can capture that workflow somewhere so that it can be re-run and applied to other term lists like you've linked. Think CSV->OWL. Just a thought.
@samanthacsik we wrote a python module csvtotriples
for converrting CSV files into OWL documents that I think could be pretty easily adopted to your case, although it might require some restructuring of your spreadsheet -- basically preprocess your spreadsheet into the CSV format required by csvtotyiples, and then let it loose. See https://github.com/Semantic-Observations/obs-models/tree/master/examples/csvtotriples
We could consider pulling out csvtotriples from that repository and making it its own standalone repo and product -- its a bit buried where it is, but its a really useful tool.
And note the section on "Mappings" in the readme where we talk through some of the issues you've raised about pulling out controlled values that are embedded in columns in the data files....
@mbjones @amoeba @mpsaloha I've added my salmon_ontology_v0.1.owl to a new branch called intitial-salmon-onto
. I was unsure of the protocol for defining a "version," but since this is still very much a work in progress, I went with 0.1 -- I can of course change if that's not appropriate.
Right now, I've been primarily working on adding some of the faceted search terms (SASAP Regions, salmon species, affiliated organizations, working groups) and other project-specific information (e.g. people, affiliate organizations). At this point, I've been doing this it manually -- it's been helpful mostly because I'm still so new at ontology construction and I'm finding some things that I had overlooked as I was putting together my spreadsheet. That said, exploring how to bulk-upload a bunch of terms to then review/add in missing info/axioms/etc. could be good at this point (and clearly helpful for future projects).
I had modeled my spreadsheet after the ENVO's Robot Template, which accompanies a nicely detailed workflow for batch term uploads as well, though I'm not at all opposed to restructuring for use with the csvtotriples
module above.
I think that ROBOT template approach is great for now. csvtotriples might need further development work to make it work better for OWL files and ROBOT seems pretty mature by comparison.
Hey @samanthacsik, following up on our call today, I'm going to drop some comments in here about what I see. Some of these things I'm sure you are already aware of and already have on your list of todos but I felt like dropping them in here for completeness. All of these are questions we can answer as a group so it's not all on you:
salmon
, SALMO
? Can we figure out a set of top-level annotations to make with more info? I think ECSO does this and maybe does it well?<Tobias Schworer> salmon_affiliateOf <ICER>
to <Tobias Schworer> schemaorg:affiliation <ICER>
and remove the object property entirely from the ontologyowl:Thing
) could use nesting and/or mapping to other ontologies... "Data corpus", "Salmon common name..." "SASAP working group..." probably shouldn't be at the root. Things like "Fish collection method" and others need mapping.Happy to hop on a call to elaborate on these.
I didn't see an annotation for the ontology name, other metadata, and provenance at the top level. What's the name? salmon, SALMO? Can we figure out a set of top-level annotations to make with more info? I think ECSO does this and maybe does it well?
We haven't spent a ton of time discussing ontology names, though decided against SALMO
since that's the genus name for Atlantic salmon. Currently the prefix for all IDs is salmon_
like you've noted though we can easily change that. If we want to go with something super generic like "The Salmon Ontology," salmon_
might be a fine namespace to keep, but this is probably deserving of a bit more thought and discussion.
I had also added a short ontology description (if you peek at the Active Ontology tab) which reads, "An ontology which represents knowledge about salmon, features of their habitats, salmon stakeholders, and related entities." I sort of just made this up on the fly, so input here would be great. I'm not quite sure of what other metadata belongs at the top-level, but I see ENVO has added additional properties like GitRepository, bug-database, default-namespace, license, and comment (containing numbers of axioms, etc) which all might be beneficial to include. Let me know what other types of metadata you're thinking and I can add as well.
This ontology is supposed to be generally about salmon, right? If so, I think anything SASAP-specific needs to be nested. That'd mean moving "SASAP working group and project support" into a class hierarchy and possibly moving "SASAP Region" into a deeper subclass.
Yes, should be generally about salmon and I think this suggestion makes perfect sense. I need to think a bit more about how best to organize this still, but I'll do a first pass and then ask for feedback again.
Terms with with "(?)" in their label. Do you need any help with these?
Realizing I may have not pushed the most updated version before our salmantics call...I was playing around with different ways to arrange Fish collection methods
(e.g. location where method is commonly used vs. gear type). I'm not wedded to the Hand collection
, Netting
categorizations I have currently but settled on them for now.
Re: Organization > Data contributor > *
Yup, totally agree with this. I also plan to re-sort, Researcher affiliation
instances into appropriate subclasses within Organization
and likely use some sort of object property to specify which organizations are "data contributors" vs. "researcher affiliations"
I don't see terms for weight, sex, and age but I think we have data about those. Is there a plan to add terms for these things? I think it's really important
Yup, definitely in the works. Still trying to figure out if it's best to directly import what terms do already exist in ECSO or map to them using something like skos:exactMatch
. I think that we need at least all ASL measurement types included before publishing v0.1.
Do we need mappings in this tree to one or more OBOE/ECSO concepts like "Fish measurement type" subClassOf ECSO:measurementType I can't really tell whether these things are Measurement Types or Characteristics or something else.
Same as point above re: imports vs. mappings. But one of those two will have to happen and definitely will want to do something like Fish measurement type subClassOf ECSO:measurementType
since we'll also need to eventually add non-fish measurement types e.g. environmental measurement types
This sub-tree feels like it's mixing types between measurements, characteristics, and methods
Thanks for picking up on this. I had originally called this Fish measurement protocols
but after some discussion with Mark, we decided to change to Fish length measurement types
and that all subclasses (e.g. fork length
) should reflect the physical measurements themselves and not the protocols. This might involve revisiting how we define these terms. But I'll have to do some more thinking on how best to clarify.
"Fishery type" is missing "Personal use" and "Commercial" if we're going to make this comprehensive.
Yup, Mark and I decided to change those instances to subclasses. I had deleted Commercial
to re-add as a subclass just before our meeting and didn't quite get to it, but made those updates now. TIL that Personal use
fishery is a thing! Thanks for that. Looking at ADF&G's website now and see the following fishery types, but let me know if I'm missing anything:
All of the object properties look like they could use equivalentProperty mappings or wholesale replacement with properties from existing ontologies. e.g. change
salmon_affiliateOf to schemaorg:affiliation and remove the object property entirely from the ontology
Yes 100%. This was largely my own naiveté when it comes to object properties...I just started making up my own (lol).
Some terms at the top level (under owl:Thing) could use nesting and/or mapping to other ontologies... "Data corpus", "Salmon common name..." "SASAP working group..." probably shouldn't be at the root. Things like "Fish collection method" and others need mapping.
Also yes and in the works :) I'll probably ask for input on some of these when I get there too.
Thanks again for all this, and I'll ping everyone once I have a next update ready.
Thanks for the detailed response to my comments @samanthacsik!
Re:
Let me know what other types of metadata you're thinking and I can add as well.
That's probably a great start. Name + authors + license + website/github is really pretty good. Maybe other ontologies like ENVO would be a good place for ideas.
sport (would you say this is the same as recreational?)
I think so. and "personal use" is a special kind of "subsistence". And people unfortunately use them interchangeably. Safe to connect them but not make them synonyms.
In Alaska, "Personal use" is a legally defined regulatory category of fishery. It is defined as "the taking, fishing for, or possession of finfish, shellfish, or other fishery resources, by Alaska residents for personal use and not for sale or barter, with gill or dip net, seine, fish wheel, long line, or other means defined by the Board of Fisheries."
Subsistence and Personal Use has more info. The short of it is the "Subsistence" generally refers to federal regulations. Even ADF&G mixes them. Both are harvest that isn't for fun or for money.
With that complete overshare of information, we might look at the SASAP data to see if we have instances of both personal use and subsistence.
@amoeba @samanthacsik @mpsaloha To follow the new proposed CONTRIBUTIONS model, I renamed the branch for the salmon ontology work to feature-49-salmon-ontology
, which is currently identical to initial-salmon-onto
. I would like to delete the initial-salmon-onto
, but let's discuss whether that is warranted. I will also merge changes from develop
into this branch to bring it up to parity.
Sounds good. We decided to not delete stale branches that are pushed to GitHub with our other projects so we might stick with that here unless we have a good reason to deviate here. Personally, I'm fine with the extras but others may differ.
We're close on this and @mpsaloha is going to wrap up a few more changes before a first release. Beyond @mpsaloha 's work, we identified a few other things I'll work on:
@mpsaloha handed off his latest copy to me and I committed it in https://github.com/DataONEorg/sem-prov-ontologies/commit/7bd54f6d59a0b7953c7da801c17dbe01bc86e1b6. I'm going to work through the issues in the abov ecomment and https://github.com/DataONEorg/sem-prov-ontologies/issues/122.
Issues above were all completed and released in salmon-0.3.2
We need an ontology or set of ontologies to annotate SASAP stuff with. @mpsaloha is looking at building on the salmon ontology @csjx worked on to see if we would build from there or build on top. Doesn't need to be complete, just broad enough to let us demonstrate the value of semantics on the test corpus #48 .
The vocabulary should be compatible with the approaches in ECSO and OBOE.
This task should release a first pass at the base vocabulary terms needed to annotate important variables in the SASAP datasets, recognizing that future releases under additional tickets can extend with additional terms. Open additional tickets for these future extensions as needed so we don't lose track of the directions for the vocabulary.