IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
879 stars 492 forks source link

Suggestion: Supporting publication (and possible federation) through ActivityPub #5883

Closed michbarsinai closed 2 months ago

michbarsinai commented 5 years ago

This issue is where the community can discuss aspects of integrating ActivityPub functionality to Dataverse.

Background:

Dataverse   ActivityPub

rigelk commented 5 years ago

Being an "ActivityPub" server is quite vague and does not guarantee interoperability. We should define what we want to share over ActivityPub: what kind of activities, and what kind of objects. (i.e.: 'Create' an 'Article' object - 'Update' a 'Document' object)

A quick and easy to understand tutorial explaining the different concepts of ActivityPub can be found here.


Assuming the goal of using ActivityPub is to share datasets and ease their dissemination, we should first create a 'Dataset' object derived from the base 'Object' and define properties that map current dataset metadata. I already have a draft for such an object definition. Now, I am not sure of the different types of objects used internally by Dataverse for dataset/verses, publications from journal/pre-print servers, raw datasets from lab machines, grant proposals, CFP announcements, etc. Maybe they require separate types, or maybe we can just gather them under a generic 'Corpus' type of some sort? In any case curation could be done via loosely-coupled 'tag' objects (a basic property already included in the 'Object' type) that behave like hashtags on Mastodon.

If the goal of using ActivityPub with Dataverse is just to have comments from the rest of the fediverse, then no vocabulary adaptation is needed and we can use the existing 'Note' objects already used by Mastodon for Twitter-like messages.

We also need to identify entities in Dataverse that are responsible for the publication of Activities. Users? Organisations?

Sorry for being so vague, I'm not familiar with Dataverse.

michbarsinai commented 5 years ago

Thanks for raising these topics, and for the reference to the post about learning ActivityPub - I wish I saw it before going through the protocols myself.

The question of what exactly is posted, and in what format (or formats?), is something we need to discuss as a community. I think this issue can be divided to two main parts:

  1. "Frictionless dissemination" of scientific datasets. In the long run, as more systems join us, this could lead to a scholarly social network based on the Fediverse (and the Dataverse project blazing the trail here).
  2. More efficient harvesting and federation between dataverse instances. This is more of a Dataverse-internal technical discussion, but implementing it the right way will also implement part 1.
rigelk commented 5 years ago

@michbarsinai actually Dataverse will not start alone a scholarly social network - other projets (like OLKi which I'm part of) are actively working on the subject and willing to establish a common vocabulary of disseminated data.

Speaking of OLKi, we exchange the same kind of data than Dataverse (datasets) although not limited to scientific data, so a common ground should be easy to reach.

rigelk commented 5 years ago

I have assembled some thoughts in an early draft of such a vocabulary. It is not telling what properties of inherited types should be used a minima (yet): its purpose is just to see if the fundational basis (the types used) are okay for everyone. Comments are more than welcome!

michbarsinai commented 5 years ago

That's great news! Is there a working group or other effort to organize this?

On 5 Jun 2019, at 14:33, Rigel Kent notifications@github.com wrote:

@michbarsinai https://github.com/michbarsinai actually Dataverse will not start alone a scholarly social network - other projets (like OLKi https://framagit.org/synalp/olki/olki which I'm part of) are actively working on the subject and willing to establish a common vocabulary of disseminated data.

Speaking of OLKi, we exchange the same kind of data than Dataverse (datasets) although not limited to scientific data, so a common ground should be easy to reach.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IQSS/dataverse/issues/5883?email_source=notifications&email_token=AAOZIJDYICPF637FQB3DC4LPY6QAJA5CNFSM4HPP2QOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW7NRLA#issuecomment-499046572, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOZIJCJXWSK4LTPK2OS4BDPY6QAJANCNFSM4HPP2QOA.

rigelk commented 5 years ago

No, but since this is mainly related to ActvitiyPub, we can attach to the working group responsible for the protocol at the moment: SocialCG (used to be SocialWG). Note that they work primarily on ActivityPub extensions, not really on making official dialects.

pdurbin commented 5 years ago

@michbarsinai over at https://github.com/linkedresearch/linkedresearch.org/issues/17#issuecomment-500742027 I learned from @rigelk that the ActivityPub spec has a note about Linked Data Notifications, something @csarven asked me about the other day at https://gitter.im/linkedresearch/chat?at=5cfb8aed6f530d3b61592bd1 ("Please remind me, does Dataverse implement LDN? Any plans?").

Here's what the note says:

"Note: Relationship to Linked Data Notifications

While it is not required reading to understand this specification, it is worth noting that ActivityPub's targeting and delivery mechanism overlaps with the Linked Data Notifications specification, and the two specifications may interoperably combined. In particular, the inbox property is the same between ActivityPub and Linked Data Notifications, and the targeting and delivery systems described in this document are supported by Linked Data Notifications. In addition to JSON-LD compacted ActivityStreams documents, Linked Data Notifications also supports a number of RDF serializations which are not required for ActivityPub implementations. However, ActivityPub implementations which wish to be more broadly compatible with Linked Data Notifications implementations may wish to support other RDF representations."

I wrote a little bit about the Linked Open Research Cloud (LORC) in the 2017-09 Dataverse Community News at https://groups.google.com/d/msg/dataverse-community/e1wHF05slNg/nkRMQZdQBAAJ but it has sort of fallen off my radar, to be honest. To me it feel related to what we've been talking about with ActivityPub. If you go to https://linkedresearch.org/cloud you can see this (emphasis mine):

"We are building an infrastructure to semantically represent all aspects of scholarly communication - from data and research artefacts, to claims within academic articles - as well as to connect resources and the activity around them by means of notifications and visualisations."

pdurbin commented 5 years ago

Slide 20 at https://openresearchcloud.app.box.com/s/x6jpycc319k85u6m8qnr5o1w2yp6nbys via "Developing Data Federation Standards" at http://www.openresearchcloud.org/washington-dc-may-22-2019/washington-dc-agenda-and-presentations/ by @mercecrosas is about ActivityPub. Here's a screenshot:

Screen Shot 2019-07-15 at 8 39 45 AM
pdurbin commented 2 years ago

"a workflow step to send Linked Data Notification (LDN) messages" in this new PR:

cmbz commented 2 months ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.