cabal-club / cabal-core

Core database and replication for cabal.
GNU Affero General Public License v3.0
302 stars 43 forks source link

Proposal: adopt ActivityStream vocabulary #43

Closed christianbundy closed 5 years ago

christianbundy commented 5 years ago

Message schemes are really hard to change as the protocol gets older, so if there are any inclinations of interoperability I'd heavily suggest adopting some shared semantics.

Current

{
  "type": "chat/text",
  "content": {
    "channel": "default",
    "text": "hello *world*"
  },
  "timestamp": 1576152732000
}

Proposed

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "to": "cabal:channel:default",
  "mediaType": "text/markdown",
  "content": "hello *world*",
  "published": "2019-12-12T12:12:12Z"
}
RangerMauve commented 5 years ago

Might also want to consider @pfrazee's unwalled garden stuff. šŸ˜œ

okdistribute commented 5 years ago

@RangerMauve @pfrazee yeah i wonder how that will go, would be nice to have some way to think about interoperability with w3c and have an official answer for folks. I don't really know the benefits, and it seems like the format here in particular is a bit wonky (timestamp formatting and content)

@christianbundy what existing applications use activitystreams?

RangerMauve commented 5 years ago

@karissa Mastodon, PixelFed, PeerTube, some of the stuff Darius is working on getting RSS into activitypub.

I think there's also some discussion starting in SSB about adopting AC. https://viewer.scuttlebot.io/%25PWLnDqu8DnmtN04COlkM7937xiP4fqEMS0hDEUqSlhs%3D.sha256

christianbundy commented 5 years ago

Yeah, a decent chunk of Fediverse software platforms implement it, and I've been pushing for ActivityStreams adoption in Scuttlebutt for the past ~3 months (sparked strengthened by a conversation with Darius at scuttle-camp).

Older (and longer) discussion: https://viewer.scuttlebot.io/%25qTEUjQajo0VCc6RkoCBdG%2FKlqo1LKCsHfaK%2BRQP0kdI%3D.sha256

cblgh commented 5 years ago

i'd love to see a fork of a cabal client that implements this and interops with activitypub somehow :)

personally, the AP schema above feels way too messy to implement straight up in existing cabal clients. but maybe we could enable support for message schema adapters (like, mapping a cabal message schema to what it would be in the AP vocabulary).

cabal kind of prides itself on being message type agnostic - any client can come along and decide they want to eschew the existing schemas and conform to a new one (which is great, it means it's super easy for someone to go ahead and create a MUD, a loomio or trello clone ontop of cabal)

i have no idea if that's a good idea or not, and if i see ssb implementing AP vocab in a nice way i'll probably be swayed haha

pfrazee commented 5 years ago

I'm not personally a fan of JSON-LD or RDF in general. I wrote a small piece for the unwalled garden announcement post that I removed but saved (for just such an occasion!):

Why not use RDF?

I expect people to ask about RDF, so let me cover it. I don't currently intend to use RDF. I think the technology is extremely similar to what we're doing, and it has some strong adherents which I don't want to discount. I just don't like how things like JSON-LD feel to use in Javascript.

In my opinion, the mistake of RDF is that it assumes we need to arbitrarily merge schemas into a shared namespace and that's just not true. They're reinventing sub-objects and creating ugly syntaxes as a result.

Consider the following RDF-like approach to merging schemas:

{
  "@context": {
    "dat": "dat://datprotocol.com/manifest#",
    "ug": "dat://unwalled.garden/"
  },
  "dat:title": "Paul Frazee",
  "dat:description": "Beaker guy",
  "ug:follows": [
    "dat://alice.com",
    "dat://bob.com"
  ]
}

This is just as easily handled using a sub-object:

{
  "type": "datprotocol.com/manifest",
  "title": "Paul Frazee",
  "description": "Beaker guy",
  "ext": [{
    "type": "unwalled.garden/follows",
    "urls": [
      "dat://alice.com",
      "dat://bob.com"
    ]
  }]
}

Why bother with all the ugly syntax of namespacing? I honestly don't see the point and I find the RDF syntaxes hard to look at. Time may prove me wrong on this, but I'd rather risk trying something that's underpowered than accept a less friendly UX by default.

(The above examples are 100% examples and are not real schemas in use.)

The current cabal schema is much closer to how I think schemas should be written. My only suggestion is that the type should be a URL (with no scheme) which points to documentation on the schema.

RangerMauve commented 5 years ago

@pfrazee What do you think about adopting the same keys as activitystreams without the extra namespacing? Like, the general structure of objects that they've defined.

okdistribute commented 5 years ago

I see @pfrazee's point, namespacing has some strong roots in enterprise software, where they assume software engineers have a lot of time to learn the semantics and aren't doing this on the weekend

@cblgh I see your approach of trying to make cabal flexible, the mistake there is that everyone tries to make their schemas flexible or simple enough to build upon, but then we just have a bunch of competing schema adapters to work with and convert to :) this is not a new problem in software, hell, we can't even get date time agreement between languages or map data formats right :) I wouldn't underestimate the annoyance of schema adapters, as someone who has had to write quite a few of them for mapping data w/ mapeo!

I see the benefit of cabal being a small-core p2p social data structure that doesn't assume the application being built on top of it, and also see that these other social platforms are using activitystreams. It could boost the viability of transferring existing networks to a cabal-based platform. This would benefit cabal immensely by potentially bringing on tons of users from these federated platforms to real p2p world.

pfrazee commented 5 years ago

@RangerMauve I'd only suggest interop if you intend the networks & clients to interoperate directly. (As in, you want a Cabal client to be able to render ActivityStream notes and cabal chats in the same interface without any munging.) I don't see exactly how Cabal would do that. I also don't think it'd be worth the effort for UG to do it either. I'd rather just use custom schemas that can fit our designs more specifically.

okdistribute commented 5 years ago

@pfrazee i think cabal-core is intended to be used for things outside of chat, so it would be likely a new social-style application, not the irc clone. (correct me if i'm wrong @cblgh !)

pfrazee commented 5 years ago

@karissa it's really not that I dislike namespacing. I just dislike how they do it with the prefix syntaxes. It sucks to code against. All of UG's schemas are namespaced, just in a fashion that feels good for JS code.

EDIT: An example of being crummy to code with being:

user['dat:title'] = 'Paul'

and

user['@context'].ug = 'unwalled.garden/'
RangerMauve commented 5 years ago

@pfrazee That's a really good point! @christianbundy what's your main reasoning having Cabal adopt AS? Or SSB, even.

My motivations for spreading AS is to make it more standard to deal with social data from different places and have more easy of interop. I really like the idea of creating ActivityPub servers that bridge P2P social things to the fediverse and the HTTP web.

RangerMauve commented 5 years ago

I think this post on SSB phrased it pretty well.

while iā€™m also keen for message schemas (as blobs referenced by hash), i donā€™t think following the ActivityStreams vocabulary for their messages structures means we need to follow them exactly in every detail. as in, i donā€™t think adopting ActivityStreams is all-or-nothing, i think thereā€™s a benefit to just composting our bespoke message structures into something more standard that is well-designed and well-documented.

given that the ActivityStreams vocabulary overlaps with Scuttlebutt in that our message semantics are the same, by following their lead we can free up more energy for us to focus on where we differ from ActivityStreams, itā€™s about choosing our battles. i donā€™t think the structure of how to represent standard message content (notes, articles, profiles, follows, blocks, ignores, likes, dislikes, events, flags, mentions, etc) is a battle worth fighting when our best reinvention is no better than ActivityStreams, i think we have more important battles to fight. :smiley_cat:

cblgh commented 5 years ago

@karissa no you're right, like i want to write a trello that works over the same cabal key while still being able to have the chat clients of today :3

i agree with mauve on understanding the purpose of the proposal.

also, i just think the more cruft you add to a schema the less people will want to bother making anything (c.f. i haven't made anything for AP yet because the vocab spec scared me away). there's something about the time you need to invest before you actually have some small success. there's also the argument of aesthetics, haha

a revamped proposal could maybe be

{
  "to": "default",
  "mediaType": "text/markdown",
  "content": "hello *world*",
  "published": "2019-12-12T12:12:12Z"
}

(i omitted type and @context because they don't really map that well for cabal afaiu? while the others are just renaming what we already have)

edit: a similar kind of proposal also killed much of rotonde's (first social network in beaker) development, so we're just gonna charge straight ahead for now and wait until someone else adopts AP / there is an extremely valuable purpose or reason behind making a change, which in turn would make the correct choice for cabal a lot more obvious šŸ±

okdistribute commented 5 years ago

@cblgh is that what killed rotonde's development? i hadn't heard that

pfrazee commented 5 years ago

@cblgh agree on all that, though I think having a type that maps to a URL is still very useful. That's how you can enable people to make their own schemas without being afraid of ambiguity.

RE: aesthetics, I know it may seem ridiculous, but I think it's why people dislike RDF. Developers is users too.

cblgh commented 5 years ago

there were many things, but making it less accessible for people to develop was one (as well as arguing/discussing schemas instead of making things)

/me respectfully bows out of the conversation for now

christianbundy commented 5 years ago

@christianbundy what's your main reasoning having Cabal adopt AS? Or SSB, even.

Just to clarify: I've got lots of qualms with ActivityStreams. I've got lots of qualms with JSON-LD too. And JSON! And JavaScript. And HTML can be a total pain (don't get me started on Markdown (or URIs)). Hell, the English language is as janky and crufty as they come. If you need the global optimum I can't recommend any of the above.

...but if you need interoperability and accessibility I'd argue the above are good enough, at least for now. There's really nothing wrong with coming up with your own semantics, but if you're building distributed community technology I think it's a mistake to diverge from standards unless you have a compelling reason to avoid them. Developer tooling revolves around standardization and shared expectations. Like @pfrazee said above:

Developers is users too.

Again, I really want to stress how unstoked I am about ActivityStreams -- it's verbose and boring and damn there's a lot of documentation, but on the other hand:

I think having a type that maps to a URL is still very useful

Worth noting that JSON-LD has this superpower. It's usually hidden away by abstractions, but when you set @context you're setting the implicit URL for each of the properties. The expanded version of type looks like this:

{
  "@type": [
    "https://www.w3.org/ns/activitystreams#Note"
  ]
}

edit: a similar kind of proposal also killed much of rotonde's (first social network in beaker) development, so we're just gonna charge straight ahead for now and wait until someone else adopts AP / there is an extremely valuable purpose or reason behind making a change, which in turn would make the correct choice for cabal a lot more obvious :cat:

Please! I'm super excited about what you're building and I really don't want to stomp all over your garden with this proposal. What you're doing right now is wonderful and I think you're on a great trajectory. :heart: My experience with building social software on an append-only log is that schema changes are difficult / impossible, so my goal was mostly to plant the seed while it's easy less hard than it will be later.

hackergrrl commented 5 years ago

I agree with a lot of the points about the heaviness & clunkiness of ActivityPub and things like RDF.

I pretty recently used to be in the camp of "just make it all freeform & the humans will figure it out & it'll be glorious", though after some headaches around lacking schemas in Mapeo and doing migration work, I've come around a bit about schemas maybe-somewhat-kinda being useful.

In Mapeo / kappa-osm, we still use a freeform string field for type, but we also maintain mapeo-schema as a module for validating data that claims to be of certain types. It's a point of centralization, but it's not any more centralized than cabal-core being a single centralized git repo that we cluster around. It's still forkable.

In general I'm :-1: on type fields that are URLs. What if I'm offline? The app / database now has zero information to go on re: figuring out what to do with this data. It could parse the URL and maybe deduce its type from something in it, but then you're back to the same problem of interpreting schema from an arbitrary string. :boom:

okdistribute commented 5 years ago

Yeah I have to echo this from @noffle, deploying Mapeo into the wild (and even cabal) has made it pretty clear that having clearly defined schemas with upgrade paths for older clients is pretty crucial.

Using ActivityStream might be annoying at first but it might create nice avenues for the future to not have to write all cabal-specific upgrade paths for this (cabal can then benefit from the ecosystem)

hackergrrl commented 5 years ago

It might be a worthwhile ("fun") exercise to draft up AP equivalent schemas for cabal's current types, and imagine what future types (attachments, admin/mod management, joining/leaving channels, etc) might look like under the AP model as well.

pfrazee commented 5 years ago

@noffle Using a URL as the type does not require the URL to be "live." There's no expectation that the software will fetch some resource from the location (which really wouldn't be useful anyway because schema support has to be preprogrammed). So, the type URL is just an identifying string.

The idea behind using a URL is that 1) it gives an unambiguous global namespace thanks to DNS so that schema IDs dont conflict, and 2) when developers are trying to integrate a new schema they can easily find the documentation.

EDIT: You're free to use crypto-domains (hashes, pubkeys) for schemas if you don't like DNS. It's just harder for people to read a hash than it is to read a DNS shortname!

after some headaches around lacking schemas in Mapeo and doing migration work, I've come around a bit about schemas maybe-somewhat-kinda being useful.

One of the main lessons I took from SSB was that, if you don't have schema specificity, then everybody becomes afraid to make changes to the schemas because they can't predict how those changes will affect other clients. You risk (eg) two clients adding their own tags field to a "post" type with very different meanings, and suddenly they're breaking each other. You actually end up restricting developers by not giving them unambiguous schemas.

If you want to maximize developer freedom, you just need one requirement: a type URL. Now anybody can publish their own schemas, there's no issue with ambiguity, and there are clear upgrade pathways in the future.

pfrazee commented 5 years ago

Regarding AS, while they do use RDF (via JSON-LD) I haven't seen them leverage RDF's features. They almost always have only one context. In the issue's opening example:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "to": "cabal:channel:default",
  "mediaType": "text/markdown",
  "content": "hello *world*",
  "published": "2019-12-12T12:12:12Z"
}

Once resolved, this is very similar to just using a type URL:

{
  "type": "https://www.w3.org/ns/activitystreams/Note",
  "to": "cabal:channel:default",
  "mediaType": "text/markdown",
  "content": "hello *world*",
  "published": "2019-12-12T12:12:12Z"
}

The tragic thing is that our example JSON-LD actually resolves to this:

{
  "@type": "https://www.w3.org/ns/activitystreams#Note",
  "https://www.w3.org/ns/activitystreams#content": "hello *world*",
  "https://www.w3.org/ns/activitystreams#mediaType": "text/markdown",
  "https://www.w3.org/ns/activitystreams#published": {
    "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
    "@value": "2019-12-12T12:12:12Z"
  },
  "https://www.w3.org/ns/activitystreams#to": {
    "@id": "cabal:channel:default"
  }
}

Which... holy shit, what? Play around with their playground tool to see all the expanded forms and you can see why I really dislike JSON-LD.

hackergrrl commented 5 years ago

The idea behind using a URL is that 1) it gives an unambiguous global namespace thanks to DNS so that schema IDs dont conflict, and 2) when developers are trying to integrate a new schema they can easily find the documentation.

That makes sense; especially 1! I think it might not be that hard to get both by using something similar (like a dat uri), without needing to adopt the AP ecosystem. I was just reading the spec a bit, and gosh does it feel unwieldy (to me). A lot of it doesn't really seem to map onto a non-server-client model.

cinnamon-bun commented 5 years ago

From a UX perspective: when someone sends a message they are trying to communicate with another human. It's our job as developers to faithfully transmit the message and preserve its meaning. If we don't, we get confusion or friction between users when they violate each others' social expectations ("you're spamming the channel!") and maybe some harm from privacy confusion.

It helps to have schemas that carefully define the technical and social meaning of a message. This includes privacy expectations, ordering of messages relative to each other, intended meaning and privacy of heart or star buttons, if the message is meant as synchronous (chat style) or asynchronous (email style), expected length of a message (tweet/chat vs email/post), etc etc.

Of course developers/users are free to display things in their own creative way but I hope they start from a place of understanding user intent. And it's a lot easier to write a new client when you don't have to deal with a million slight variations on a "chat" message.

So my feelings are:

  1. I like URLs as message types.
    • Anyone can add their own message type (preserving flexibility & decentralization)
    • Prevents accidental collision between different developers which results in technical and social confusion
    • Developers can easily find schema documentation, also helping with technical & social cohesion of the network
  2. I don't care which schemas are used as long as they're documented. (I assume it wouldn't be too hard to build a converter between any schema and ActivityPub.)
  3. Well defined schemas don't have to mean ossification. You can make my-cabal.org/note.v2 for version 2 of a message type. When future developers come along and find several versions of a message type deep in the append-only log, they'll be so happy that each one is documented :)

For example, here's notes on a proposed Image type for the old Tent network (similar to Mastodon). It includes technical details about EXIF timestamps to help developers do a consistent thing, and also privacy best practices like recommending that the user can clear location metadata from a photo they're posting.

Schemas aren't a betrayal of our distributed ethos - they help us communicate, and anyone can make their own. I'd even suggest naming a schema something silly like "Strawberry Mode" to make it clear that it's not the One True Cabal Schema. And clients can say "This client supports ActivityPub 1 through 7, Strawberry Mode, and Fish Whispers 2.0"

cinnamon-bun commented 5 years ago

TL;DR, my perspective

ghost commented 5 years ago

There's an assumption in this conversation that integration with existing formats is generally useful, but I am skeptical of that. I already think the json based messages are a bit verbose and inefficient but duplicating the same wordy URL (which is subject to linkrot) seems extra painful. It's way too early to know what data formats are going to make sense for cabal. I think if you want to design other applications that aren't chat, then make up your own protocol. ssb really suffers from data bloat by cramming every kind of application message into the same feed.

Another thing to think about is how programs with fundamentally different usage profiles (chat vs mastodon-style social networking) will clash and create an annoying experience for both. Mastodon users would see a deluge of chatter and cabal users would see walls of text that get lost in more casual conversation.

cinnamon-bun commented 5 years ago

I'm not pushing for ActivityPub, but wanted to share this article by Darius: A highly opinionated guide to learning about ActivityPub

aral commented 5 years ago

From my experience with ActivityStreams: itā€™s way too overcomplicated. Also I have a problem with extending the semantics of messages of at the protocol level instead of at the message level. The former gives power to developers (and thus favours centralisation), the latter to the people using the system (and thus favours decentralisation).

The focus, in my opinion, should be to have the core protocol as simple as possible (a dumb pipe) and to extend the semantics using messages themselves so anyone can create new types simply by sending a message that contains metadata and if clients add support for that type, then boom, you can, for example, have fancy rendering on a chess board for a chess game. Clients that donā€™t support a certain message type either fall back to plain text content or ignore it or present it as an unknown message type or perhaps prompt to search for a plugin/add-on/extension to render it. (This is similar to how Twitter was going to implement annotations if Blaine Cook et al could have had their way before their business model/exit strategy kicked in.)

Eg.,

A message sent in a chess game:

@karissa I moved my pawn to e4.

ā€” data ā€”
type: ind.ie.chess-move,
move: e4

In such a system, the community using the system can extend its semantics organically without requiring centralised changes to the core messaging platform. It would help democratise the platform and devolve some power from developers to the people using it.

okdistribute commented 5 years ago

Thanks everyone for comments. I think there is sufficient information here now for folks to decide if they want to implement activity pub semantics. It sounds like it probably won't make it in to the core library but if someone wanted to create an activity pub messaging system (I. E. module/app) that is compatible with cabal, it could be experimented with. I propose we close this issue.