gbif / hosted-portals

Support material for establishing the GBIF Hosted Portals
Apache License 2.0
9 stars 6 forks source link

Feature request: implement local context notices #272

Open sformel-usgs opened 8 months ago

sformel-usgs commented 8 months ago

I spoke with a few folks at TDWG about the possibility of including Local Context notices on our portals as a way of beginning to indicate a commitment to equitable engagement with indigenous peoples, and potentially to identify data that might have accompanying cultural rights, protocols, and responsibilities.

I think everyone I spoke with would agree that we still need to educate ourselves more on what commitment looks like, and how to identify relevant data. However, I'm interested in learning more about how challenging it would be, from a technical standpoint, to implement the LC notices on the portals and/or individual datasets.

jbstatgen commented 8 months ago

Recently Edmund Schiller submitted the report of the SYNTHESYS+ task group 3.3 that had worked over the past years on permits and transactions to RIO. It provides a framework that might be of interest here, too.

We developed classifications (typologies) at two levels: one proposes a categorization of legal and contractual documents associated with collection objects. The lower level typology proposes standardized definitions for the concrete rights, ie. permissions, prohibitions, obligations and restrictions, stated within the documents. Currently the typologies focus on legal and contractual documents. They can be expanded to include ethical and social categories. We were aware of the need for ethical and social categories, however, had no time to develop them.

Regarding technical implementation, the approach that we explored used the Open Digital Rights Language W3C ontology (ODRL). We could show that it can be used in a loan interaction to inform participants about legal documents and rights attached with an object and that the rights within such documents can be made machine-actionable using ODRL functionality. There are limits and we discuss these and point to some alternative standards. There doesn't seem to be a reason why the ODRL couldn't be used similarly for ethical and social information.

For example, on a GBIF portal, when a visitor clicks on the digital representation of a collection object or observation, they could be informed about its IPLC context and the concrete contents of Local Context labels. It would even be possible to bring up a window informing users that certain information is sensitive and should be viewed only in certain contexts, leaving the user to decide if they will follow the proposed code of conduct or ignore it and open the information nevertheless. Certainly on the provider side (GBIF, the portal provider) the ODRL should be able to restrict access at different levels and in different ways to sensitive information.

timrobertson100 commented 8 months ago

Thanks @sformel-usgs for raising this and @jbstatgen for the background. This is all new to us, but I'll try and comment on your question about technical challenges.

From a first read of the LC notices, I could imagine they may appear on any of the publishing organisation/institution, dataset, collection, and record (e.g. specimen) pages. For the organisation/institution/collection I think we'd need to add some fields in the registry, for the datasets we'd likely need to explore where it fits in the EML schema, and for the records, I would anticipate it may need a Darwin Core Archive extension (possibly using ODRL terms?) or perhaps new terms in Darwin Core.

Would it make sense to try and work through an example? We could even prototype something whereby we use machine tags (structured annotations) against the institution/dataset, and perhaps make use of dwc:dynamicProperties or the measurementsAndFacts extension for the specimen records. This wouldn't be a final solution, but would already be handled in the existing infrastructure (IPTs / GBIF.org ingestion) such that we'd be able to display the labels on the hosted portals.

Thanks

(CC @MattBlissett and @melianieraymond for info, as I know you've also been in related discussions)

sformel-usgs commented 8 months ago

Thanks to both of you for your thoughts. @timrobertson100 I think working through an example would be productive. I have some datasets in mind that could work for testing. I'll reach out to the POCs and get back to you.

I imagine that portals would find it useful to be able to implement notices at different levels of granularity. Something higher level than what you've described, that might be useful for the portals (and perhaps GBIF,org), could be a general notice of "Openness to Collaboration". Two examples of this are:

I imagine creating a template for an extra webpage isn't too complicated, but something that emerged from our conversation at TDWG is that it's important that there be a stable and clear POC, and a well-thought out plan for how to proceed if contacted by an indigenous community. In short, that this page not be implemented performatively.

Returning to the technical side of things, the IEEE is developing a standard, "Recommended Practice for the Provenance of Indigenous Peoples’ Data". I can't recall the timeline on this, but it's something that we should be aware of in thinking about this.

Likewise, this protocol from the Coeur d’Alene Tribe and University of Idaho on data sharing, (Protocol and Best Practice for the Research on and Public Distribution of Information from Projects involving Indigenous Peoples), includes a section on "Best Practices for Metadata Creation in ISO 19115", perhaps this could serve as a model for thinking about using/extending DwC terms.

I can't recall if I've heard of an EML-specific effort wrt LC labels and notices, but it wouldn't surprise me if one existed. I've reached out on their slack channel to see if there is a specific person or group we can chat with.

sharifX commented 8 months ago

Also check the initiative from DataCite: "In January 2022, DataCite and Local Contexts hosted a joint workshop to explore the connection between Local Contexts and DataCite. During this workshop, participants discussed how the Local Contexts Hub could be integrated into repository workflows and how to represent Notices and Labels within the DataCite Metadata Schema. Proposed guidelines for the Rights metadata field were workshopped by the 12 participants, who represented repositories in Canada, New Zealand, and the United States."

sformel-usgs commented 8 months ago

Thank you @sharifX for adding that. The good news is that we're compiling all of these efforts in a ticket that will help us (or me at least!) keep track of the interested parties. However, I'm still unsure where to best apply our energy.

I checked in on the NCEAS slack about EML and @mbjones from the Arctic Data Center said:

at the Arctic Data Center we have been documenting ethical research practices in the methods section with specific, required fields. See the blog post on metadata editing support for this, along with our page on questions on data ethics . We add data sensitivity tags into EML as annotations, and I’ve always thought Local Contexts tags could be added the same way.

He also reminded me that @mobb is the chair of the ESIP Sustainable Data Management cluster, a group that is working on recommendations for repositories wanting to implement CARE principles. They are continuing a discussion on working with metadata schemas that started at the ESIP meeting this past July.

I'm not sure where to go next. We could:

  1. Do some exploring of technical solutions and see if any are promising
  2. Bring this conversation to one of these other groups and see if we have a role there
  3. Be patient and continue to educate ourselves based on the output of these, and other, groups

Thoughts?

mbjones commented 8 months ago

@sformel-usgs Thanks for including me in this conversation. Regarding the serialization Local Contexts tags in EML, I thought I'd provide a few more details on how EML 2.2.0 supports semantic annotations to controlled vocabularies. Here's an example from an Arctic Data Center dataset:

<annotation>
      <propertyURI label="theme">http://www.w3.org/ns/dcat#theme</propertyURI>
      <valueURI label="Human Geography">https://purl.dataone.org/odo/ADCAD_00078</valueURI>
</annotation>
<annotation>
      <propertyURI label="Data Sensitivity Category">http://purl.dataone.org/odo/SENSO_00000005</propertyURI>
      <valueURI label="De-identified data">http://purl.dataone.org/odo/SENSO_00000003</valueURI>
</annotation>

In this representation, the subject of the annotation is the dataset identifier being described (other areas let you also be explicit about the subject). Details on EML annotations are at https://eml.ecoinformatics.org/semantic-annotation-primer

For Local Contexts TK Labels, I have searched for a suitable ontology/vocabulary that represents all of the terms and can provide term URIs, but I have not yet found one. If you know of one, please do share. We register all of the vocabularies that we use in the BioPortal service so that term relationships can be explored mmore easily by folks with interest.

sharifX commented 8 months ago

The Local Contexts Github page has documentation about the API which lists attributes. There is also mention of Native land API(territories, languages, treaties for instance) and IANA language tags.

peggynewman commented 8 months ago

Hi all, we're keen to collaborate on this in the ALA with our Indigenous Engagement research colleagues in Australia. The Local Contexts Biocultural Notices and Labels are being launched at the end of the month in November in NZ. GBIF NZ/Manaaki Whenua has some examples of both Collections Notices and Labels on data are keen to share as exemplars - I'll try to get a link.

Note that both notices and labels need to be applied at record level, possibly at collection level - but I'm not really seeing a case for an EML/registry implementation, perhaps it's Darwin Core terms or even dynamicProperties to start with.

AaronWilton commented 8 months ago

Hi all. I have been actively looking at some of this - trying to find existing standards etc & thinking about how we might be able to pass this information around in a collection/occurrence context. I think there are at least three main uses cases - including one at a dataset (EML) level, which would then probably need to applied to individual occurrences.

I will try to share some of my very preliminary thinking in the coming week or so -- I expect to be at the presenting at the local contexts summit (https://localcontexts.org/events/lc-summit/ ) at the end of the month - I was interviewed for the documentary that will be premiered (hmmmm!) - and will potentially say something about this activity depending on progress in the intervening period.

I highly recommend the native land layers - but a note of caution the polygons are described both clockwise and anti-clockwise for containing polygons - so we have to check and correct this whenever we bring updates into our systems.

if you want to see some BC Notices and labels attached to specimens for real see https://scd.landcareresearch.co.nz/Specimen/CHR%20365035 for an example. All objects in our collections now have BC notices or labels or both.

Will feed more into this discussion in coming days/week. cheers a

jbstatgen commented 8 months ago

It is really encouraging to see the change towards data governance happening. Being interested in the structure of a framework for legal, ethical and social data governance metadata and of finding a way to fit it into wider metadata frameworks, eg. Darwin Core, I checked some of the high-level specifications. These were the specifications for kernel information and FAIR Digital Objects developed by the RDA, the FDO-Forum and specifically for biodiversity data by DiSSCo. A kernel slot defined by one of these organizations (or others) likely would be a good "home" for a Darwin Core extension.

@timrobertson100 pointed out that a preliminary slot needs to be found to be able to move ahead with an implementation while all this gets sorted out and developed. Having data governance information as part of DwC's RecordLevel using dwc:dynamicProperties sounds more appropriate to me than using the general MeasurementOrFact class.

Exploring a slot for a data governance metadata module, the RDA Recommendation on PID Kernel Information (expanded paper on zenodo) seems to propose such a slot. In their proposed kernel information profile they define at position 5 a pointer to a policy object called digitalObjectPolicy.

The EOSC interoperability framework seems to have several high-level slots for "legal" information (see Table 2 in this discussion of FDOs). In both, the EOSC and FDO frameworks a clear slot for data governance seems to be still needed? Interestingly closest might be "policy/guidance for patent/trade secrets violation" - Overall, this seems to call for a general slot capturing legal, ethical and social information on norms and best practices.

For sharing biodiversity data the openDS standard for creating FDOs is under development by DiSSCo. The recent RFC on "DiSSCo Kernel and Digital Specimen FDO Record attributes" provides the latest profile information. A slot for governance/policy information might be most appropriately added to the first 29 kernel slots storing "Core FDO attributes (common within all FDOs in DiSSCo)", since policy information is needed early on when interacting with an object wrapping biodiversity data. Alternatively, a new series might be created at 900-999 for governance data. @sharifX are there discussions or plans for including policy metadata?

Maybe you are wondering why this jump into the deep end of kernels (what is that?) and high level metadata structure. I think, that it would be good if we all align early on, and not grow a thicket of more or less different approaches. At the tips of the metadata tree, standards like the Local Context labels and similar companion standards provide a good starting point and reason for thinking about the higher level structure of governance data and for growing a tree with a solid, actionable trunk.

MattBlissett commented 3 months ago

For Local Contexts TK Labels, I have searched for a suitable ontology/vocabulary that represents all of the terms and can provide term URIs, but I have not yet found one. If you know of one, please do share. We register all of the vocabularies that we use in the BioPortal service so that term relationships can be explored mmore easily by folks with interest.

Just a month ago, UNESCO-ODIS (Ocean Data and Information System) have created a GitHub issue which might relate to this. However, I'm not familiar with that system so I'm not sure what the issue is supposed to produce.

https://github.com/iodepo/odis-arch/issues/398

kcopas commented 3 months ago

So as not to lose the thought/connection, I presented in the same session with Stephanie Carroll (lead author of the CARE Principles) at the AGU 2023 Fall Meeting. The presentation that she and Riley Taitingfong (@rileyt92) gave outlined their effort to develop an "indigenous metadata bundle". These data package are intended to facilitate indigenous data sovereignty by providing information defined and maintained by ILK communities regarding traditional peoples’ connections to place. In principle, this approach would empower communities to collect, own and apply their own data with information about provenance, attribution and protocols for future use.

It looks like some of the same work was presented here last fall (at TDWG—d'oh!). If I caught the drift properly, they sounded as if they could be structured and attached much as we do extensions. I have written to Stephanie to inquire as to the current status of the work (and now @'d Riley here as well).