DataONEorg / sem-prov-ontologies

Ontologies focused on scientific observations and scientific workflow provenance.
https://ontologies.dataone.org
17 stars 7 forks source link

create data sensitivity ontology #113

Closed mbjones closed 2 years ago

mbjones commented 2 years ago

Through our repositories, we have a need to consistently represent data sensitivity classifications for data, and using annotations against a data sensitivity ontology has been proposed. In this task, create an initial ontology that captures the data sensitivity categories for repositories and relates them to existing sensitivity taxonomies, including DataTags, the Data Use Ontology (DUO), and the Informed Consent Ontology (ICO).

See also: Information about the categories we are using for the Arctic Data Center is in the sensitive data matrix.

Candidate names:

mbjones commented 2 years ago

Started this with an initial file in a branch feature-113-create-senso in SHA 4ae8a14

This small ontology creates the three classes we agreed on for Arctic Data Center for Non-sensitive data, De-identified data, and Sensitive data, with definitions. It also creates classes for the DataTags categories (https://datatags.org), and links those where they directly correspond. I also imported the DUO ontology and was looking at where there might be equivalences there, but I am currently thinking they might be complementary and not directly related. However, it is possible that DUO's unrestricted data permissions apply to the non-sensitive data categories, etc. It would be good to discuss the relationship between sensitivity, confidentiality, and data use permissions as concepts in these ontologies.

@jeanetteclark @laurenwalker @amoeba could you take a look for the ADC annotations for sensitive data as we have been mocking up? Thx.

mbjones commented 2 years ago

Issues to be resolved:

mbjones commented 2 years ago

Added modifications to try using owl:equivalentClass for alignment, and it produced the expected inferred graph. Seems like a good way to proceed. Bryce is testing how this will work with the indexer and search.

amoeba commented 2 years ago

We had a great discussion on our salmantics call about the above points. For modeling and alignment issues, we decided to skip them for now and be open to revisiting them later. The first version of SENSO will just have three sub-classes and won't include terms for the DataTags or alignments between them. Not making the assertions now doesn't preclude us from making them in the future but making assertions now that we later on might change might cause a lot of trouble.

On the last issue,

[ ] Decide if SENSO should be its own ontology or incorporated within another

Some points were brought up:

Ultimately, we agreed to go with the modular approach and continue down the path of building SENSO out as mentioned above.

I'll make these changes and ping others for a second look.

amoeba commented 2 years ago

I made the edits we talked about (see commit message in b694039a91ab554e2c1b6f82cbea95042c1d7441). One thing we hadn't caught that I changed was that the terms were using URIs like http://purl.obolibrary.org/odo/SENSO_ instead of http://purl.dataone.org/odo/SENSO_. I opted to change them all to http://purl.dataone.org/odo/SENSO_ since that matches the ontology IRI.

@mbjones @mpsaloha please take a look.

mbjones commented 2 years ago

Looks good, @amoeba. I pushed another set of edits that fixed the odo prefix definition, and reformatted subjects to use short form in the TTL file. I checked off the boxes above on which I think we have made decisions. A few items remaining, but I think they should probably be opened as new tickets. So, what do we need to add to add SENSO into the website listing too?

amoeba commented 2 years ago

Thanks @mbjones. New tickets sounds good, I can file those in a sec.

I'll add a commit on top of this branch here that'll enable it on the GitHub pages site when we merge to main. Once merged, I think we should get this off to BioPortal so landing page popovers can pull in definitions.

amoeba commented 2 years ago

Okay, that's all done. IMO this is ready for a merge to develop and a PR onto main after that. If you just want to give me a thumbs up I'll get those going.

amoeba commented 2 years ago

Hey @mbjones, I'm ready to move this forward into production but I did want a final 👍 from you before I did that. Let me know.

mbjones commented 2 years ago

+1 let's move forward with the PR. I reached out to @jeanetteclark for feedback, but I suspect she'll give a thumbs up on the PR.

jeanetteclark commented 2 years ago

@mbjones @amoeba sorry - missed the review here. I just had a look and looks good to me

amoeba commented 2 years ago

Great, thanks @mbjones, @jeanetteclark.

amoeba commented 2 years ago

Next/last steps are to include it in our DataONE and Metacat index components. Marching off to file those tickets now.

laurenwalker commented 2 years ago

What is the official ontology URI that we should use in EML annotations?

On Wed, Oct 13, 2021 at 8:15 PM Bryce Mecum @.***> wrote:

Closed #113 https://github.com/DataONEorg/sem-prov-ontologies/issues/113 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/sem-prov-ontologies/issues/113#event-5460470740, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSV4FRIGIKWPBZWAI7MURTUGYOLJANCNFSM5E6IQ6FA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara (UCSB)

amoeba commented 2 years ago

Hey @laurenwalker ,

The terms in SENSO the arctic team will care about are the three subclasses under https://bioportal.bioontology.org/ontologies/SENSO/?p=classes&conceptid=root.

And the predicate that goes with any of those is http://purl.dataone.org/odo/SENSO_00000005.

Are there tickets anywhere tracking all of this data sensitivity work that's going on? I'm happy to chip in there with notes.

mbjones commented 2 years ago

https://github.com/NCEAS/metacatui/issues/1844

amoeba commented 2 years ago

Thanks @mbjones. I'd seen that ticket but it didn't seem like the appropriate place. I'll drop a note there.

laurenwalker commented 2 years ago

Hey @amoeba - I am getting a 404 for the purl.dataone.org links above. Are they not production ready yet?

amoeba commented 2 years ago

Not quite. I can pick up the remaining work on that, it was just at the bottom of my todo list.

amoeba commented 2 years ago

Okay @laurenwalker, SENSO redirects are in place so those URIs should also resolve now.

laurenwalker commented 2 years ago

Okay @laurenwalker, SENSO redirects are in place so those URIs should also resolve now.

Great, thanks Bryce!