Application data schemas & how to manage decentralized development

pfrazee commented 6 years ago

Yesterday, @0x0ade and @neauiore brought up how to deal with competing schema features from multiple applications.

I and @neauoire wanted to implement special event posts in Rotonde. Those would end up cluttering Fritter, which made us came up with a way how to set a post's visibility. Here's our idea, awaiting feedback :)

They proposed a small spec for Fritter and Rotonde interop:

Add client field to posts, associating the post with a client identifier. F.e. "rotonde", "fritter".

For past messages not containing a "client" field, assume that the client is matching.

The client can store its version in the ID, but it's the client's responsibility to handle it. F.e. ClientA will handle "clientb-1.0.0" and "clientb-1.0.1" as two completely separate clients. Likewise, ClientB will handle "clienta:1337" and "clienta:1338" as two separate clients. ClientA will read the version number from clienta: clients and ClientB will read the version number from clientb- IDs, though

Add optional visibility field, allowing the strings "public" (default if none given), "whisper" or "client"

"public": The post will be rendered in your feed.

"whisper": The post will only be rendered in your feed if you're the author of the post, or if your archive URL is listed in the array stored under target.

Note: "private" would be a lie, as the post itself is publicly shared, just not publicly rendered.

"client": The client automatically generated this post, f.e. "followed Person". The post will only be rendered by a compatible client. The client will gather the information as for how (or if) to render the post from any non-standard properties (f.e. action: "follow"). Clients will handle future non-standard vs standard property conflicts on their own, on a case-by-case basis.

I tweeted a bit about the challenge of decentralized development. This has definitely been on my mind.

I want to use this issue to discuss broader approaches. We can use details of @0x0ade's proposal as an example; it's very helpful in that regard!

neauoire commented 6 years ago

This came about when we were thinking of adding new types of messages. There is a new feed entry type that would be something along the lines of X started following Y; we started to think about platform specific entries and how to indicate to the other platforms how to parse these uncommon entries.

pfrazee commented 6 years ago

This issue is exactly the problem we encountered when we first worked on SSB/Patchwork. There wasn't a sufficient framework for making changes that wouldn't accidentally introduce noise into the system. This manifests with a lot of different questions:

How do I add new features and stay compatible with other apps?
How do I add new features and not disrupt other apps?
How can I predict the effects of my features on the larger network?
Others I'm sure

My first thought is that we need to switch over to JSON-LD. For those not familiar, expand these details to learn a bit more:

What JSON-LD does is, it adds a global specificity to schemas. Each key in an object is a URL. For example: ```json { "@context": "http://schema.org/", "@type": "Person", "name": "Jane Doe", "jobTitle": "Professor", "telephone": "(425) 123-4567", "url": "http://www.janedoe.com" } ``` This "expands" to mean: ```json { "@type": "http://schema.org/Person", "http://schema.org/jobTitle": "Professor", "http://schema.org/name": "Jane Doe", "http://schema.org/telephone": "(425) 123-4567", "http://schema.org/url": "http://www.janedoe.com" } ``` So, not only is that helpful for removing ambiguity and adding documentation, but it also provides a mechanism for adding new attributes without creating ambiguity. For instance: ```json { "@context": ["http://schema.org", { "ical": "http://www.w3.org/2002/12/cal/ical#" }], "@type": "Event", "name": "Lady Gaga: Live!", "ical:summary": "Lady Gaga Concert", "ical:location": "New Orleans Arena, New Orleans, Louisiana, USA", "ical:dtstart": "2011-04-09T20:00Z" } ``` In that case, we're able to add "ical" attributes to a "schema.org" schema. This should stop new features from disrupting each other. Basically, if you have a usage that differs from the documented schema usage, you'll need to add your own attribute(s). There's some helpful documentation at https://json-ld.org/ and https://json-ld.org/spec/latest/json-ld-api-best-practices/

pfrazee commented 6 years ago

In my opinion, there are two downsides to using JSON-LD:

It requires some javascript to deal with the contexts (namespaces).
You need to publish a spec for any context you create.

For 1, I'd like to see a Javascript library which helps you normalize the schemas into an expected form. That shouldn't be too hard to get. I'm still evaluating what's available.

For 2, seeing as we have a toolset to help with Web publishing, one option is to use dat. You'll almost certainly want a shortname for your spec though, and it's more overhead for development than I'd prefer, because you become obligated to maintain the spec. I'm not sure that's a fun way to do development. I personally tend to prototype quite a bit before I'm ready to publish a spec, and I'd be annoyed if I had to spend mental energy to manage namespace URLs.

One thing @taravancil and I have been talking about is a non-universal, search-based URL scheme called the "thing" scheme. Expand details for more on the thing scheme:

Basically something like `thing://Whatever_I_Want_to_Write_Here`. The idea is that the browser would handle that URL by opening the configured search app (google even) with the query "Whatever I Want to Write Here." The thing url wouldn't refer to a specific resource; it'd refer to a search. For namespaces, the idea is that you'd use a thing url, and then publish your schema with that url. Those schemas would then be "conventionally" unique instead of "universally" unique. That is, it wouldn't be guaranteed to be unique, but the url would be unique enough that it's unlikely to collide by accident. The search app would then have to help us figure out the right hit based on network signals and trust. Example: ```json { "@context": "thing://paul_frazees_fritter/schema", "@type": "Taco", "category": "breakfast", "ingredients": ["eggs", "bacon"] } ``` Eventually, I'd publish a document describing the schemas under `thing://paul_frazees_fritter/schema` and so if somebody wanted to find the docs for my attributes, they could. It may be a really bad idea in the end, but I like the usability, and I think it's worth considering.

neauoire commented 6 years ago

I've had to work with JSON-LD in the past and it quickly becomes hard to maintain as the overhead in the code is pretty high. And the barrier of entry to make new clients becomes also suddenly very high. If it's a library, then that means everyone have to carry on this extra bit of code and the network pays for it.

I might not fully understand the ramifications of this but I was thinking, maybe we could just have a sort of "consortium for communication standards" and just host public conversations on the syntax of new patterns could be implemented? In line with how RSS was at first?

I really wish for this to remain as lean as possible. We're basically just building RSS readers for communication feeds. Where/how did it break with Scuttlebutt, I don't want to find ourselves doing the same mistakes.

pfrazee commented 6 years ago

@neauoire I generally agree with the sentiment that JSON-LD adds more overhead than I prefer. My original idea was to just duck-type JSON schemas. That might work. It's very convention focused and I think that's actually good for our community -- dev UX is more important to me than "doing it right," so long as we don't do things SO WRONG that things fall apart.

Like I said in my tweet, developers have a kind of obligation to the users not to f*** up their experiences. I think that's an interesting way to pose the situation.

A central place to discuss the schemas is kind of centralizing but 🤷‍♂️ it's informal enough that I'm not too bothered by that. There's no friction to the community doing something else.

JSON-LD is one of many thoughts I have. (This issue is basically now my alternative to the blogpost I had brewing.) I'll follow up with some other ideas.

pfrazee commented 6 years ago

Client identifiers are an interesting idea. I think they might be too opaque, maybe? Because there's no meaning to them, even with versions, unless you look them up.

It might make more sense to use feature identifiers, which is basically an informal way to do JSON-LD contexts. For instance, suppose we declare which features are being used in the post, and then declare which of them are required for the post to be usable:

{
  "uses": ["fritter-social-feed", "rotonde-whispers"],
  "requires": ["rotonde-whispers"], // if you dont support this, dont render the post
  "text": "Hi, bob!", // from 'fritter-social-feed'
  "visibility": "dat://bob.com" // from 'rotonde-whispers'
}

neauoire commented 6 years ago

developers have a kind of obligation to the users not to f*** up their experiences

Amen to that.

Well, let's start with a few concrete examples then. How are you planning on handling whisper type messages. That's a good scenario here where, our direct messages are visible when looked at from fritter. How would you like for us to express that this message should not be made public by default:

We both implement type=whisper target='dat://' as direct messages and not make them visible.
Refactor our architectures to handle JSON-LD schemes.
If fritter is not planning on having direct messages, then we can keep this platform centric with a visibility tag. I prefer the target="" approach.

pfrazee commented 6 years ago

@neauoire right, adding JSON-LD doesn't automatically solve the question of "how to handle incompatibility." For instance, if we use JSON-LD and you add a Rotonde namespace with support for target, you really need other clients not to render it if they don't support that feature.

We can solve that in this case by coming to agreement about the specs that Fritter and Rotonde use, but is there a more general solution to this variety of problem? That's what I'm trying to address on with the requires idea.

csarven commented 6 years ago

I think you're on the right track.. just some high-level suggestions:

The general rule of thumb is to re-use existing vocabs if/where applicable, and then define/publish new terms for the remainder. Needless to say, publishing/maintaining new stuff is expensive. A nice thing to do is to also make relations to similar terms out there. That could be an alias, a specialisation or a generalisation and so forth. See briefly: https://www.w3.org/TR/dwbp/#dataVocabularies

I think both Rotonde and Fritter should consider starting off with the ActivityStreams 2 vocabulary: https://www.w3.org/TR/activitystreams-vocabulary/ and see how much mileage that gives. Expand to include other vocabularies to solve your respective problems eg. relatively speaking, use a general purpose vocab like schema.org to fill in the gaps. You can always publish your own terms - anyone can say anything about anything on the Web applies. If there is information that both of you would like to express but there are no existing terms for it out there, go ahead and define it. If you want to take it up a notch, approach other applications out there and see if there are terms you both agree can be covered under a common/shared/public namespace; for instance via https://w3id.org/ ( https://github.com/perma-id/w3id.org ). There really is a lot of stuff out there you can borrow from. So, you may want to look things up at http://lov.okfn.org/dataset/lov/ to get a feel of what's out there and borrow ideas from (or chase down some reasoning behind the vocab designs).

I personally wouldn't approach the problem here with "requires" client. The payload's description with shared or resolvable vocabs in place is expressive enough to signal to the consumer whether they want to use it and how. If you still want to communicate that client info, you may want to consider it from the other direction, for example along the lines of, payload generated or rendered by client ( https://www.w3.org/TR/annotation-vocab/#renderedvia , https://www.w3.org/TR/prov-o/#wasGeneratedBy ... ).

There is a lot more to all this then I can type out here in one go, so I hope this is of some use.

kevinmarks commented 6 years ago

Fritter's use case is very close to what Activity Streams was designed around, so if you're committed to going with JSON-LD, adopting that makes sense. The field names have been thought through and based on converging multiple social networks over time - the name/summary/content split is a very useful pattern that makes things clearer, and twitter's lack of that has led them into unfortunately convoluted methods.

There are, as you say, deeper problems with having variable support for vocabulary and interpretation that namespaces don't really solve, especially if you want forward compatibility. While you can just render arbitrary JSON, it needs interpretation of the fields to be shown usefully - a Fritter post's threadRoot, threadParent and createdAt are all very opaque when presented directly.

Although JSON aligns well with data structures, it is lacking the underlying distinction that HTML preserves between textual data for presentation to users and explanatory structure and metadata for apps and parsers. Having a well defined default behaviour for unknown elements (show the text contents) and attributes (ignore them) makes these kind of extension and interop easier.

Another reinvention of this approach is showing up in the static site generators, which combine key/value front matter with markdown body text as a way to split the two.

In practice, any successful format is going to have heterogenous implementations and support, with overlapping interpretations of the data. The microformats approach has been to make peace with this, and use the metadata available in HTML to indicate which elements represent the useful content for other applications, and to converge the field and structure names on this basis.

This is not an exclusive approach - that is the point; you can add microformats to your generated HTML without affecting your internal data structures - h-entry would be the natural one for Fritter.

Then you can add other formats as desired to the content of the posts. You can use post type discovery as a heuristic to distinguish the kinds of post.

pfrazee commented 6 years ago

@csarven @kevinmarks thanks for the helpful links and thoughts. I like the idea that we can have a library of vocabs to pull from, and that you can pull at-will from multiple vocabs. Tara and I have had the Activity Streams work on our minds throughout.

I think "well-defined fallback behavior" may be a key requirement here. With microformats, the HTML tags provide fallbacks. With our JSON-backed world, this may require some kind of standard protocol (akin to the requires field I suggested) or it may require a case-by-case protocol, such as a standard grammar for all social-feed messages that includes attributes to use when nothing is renderable. (This all reminds me of the Robustness Principle: "Be conservative in what you send, be liberal in what you accept.")

In case you're curious @kevinmarks why we're not exploring non-JSON formats, another key requirement is convenient developer ergonomics. The code must be accessible, even to non-professional programmers, and we're assuming JS is the environment. Even with something like JSX, the HTML syntax doesn't map very cleanly to JS objects, and JS devs just want to work with objects! This isn't to harsh on microformats at all -- we just have different requirements for Beaker's apps.

And this is part of my concern with JSON-LD; I don't want to introduce anything that feels like "needless boilerplate" (which suggests it's disconnected from the dev's sense of getting things done) nor do I want to make objects less convenient to work with (foo.relationship is preferable to foo["foaf:relationship"]). So far I haven't seen a library that gets me excited. The best I've seen is https://github.com/simplerdf/simplerdf, and even that's a bit kludgey to me. This might just be something I have to think through, but if anybody has examples of JSON-LD being used in Javascript that they think is very clean, please share.

soyuka commented 6 years ago

In ApiPlatform we're mostly focused on using JSON-LD in combination with Hydra (backend and frontend). I know that these format have been really helpful in terms of SEO and interoperability between systems. Relying on standards and RFC helps to process and extract data in a generic matter.

However, JSON-LD stays JSON and there's nothing that tells you not to use foo.relationship instead of foo["foaf:relationship"] (ie: not using any namespace). In fact, I'd add that, in combination with a given Schema (can also not be from the Schema.org), one is free to do anything he wants. There are some key parts of the specs that should be considered (for example @context, @type and @id).

Client-side works related to these format:

Generated administration - base on generative hydra documentation
Code generation - also based on the generated documentation

csarven commented 6 years ago

As already encouraged, do remain on course with JSON-LD. Some comments re HTML:

If representing structured and exchanging data in non-JSON(-LD) syntaxes was up for consideration, there is W3C Recommendation RDFa which can be used with any markup, eg. in HTML, SVG.

Transformations between RDF syntaxes keeps the information lossless. Moreover, anyone can define and publish their own vocabulary and relate it with the others', all meanwhile reusing the same RDF model to automate the consuming and decision making process. With RDF, there won't be any collisions in term usage across data because they have globally unique identifiers (using "http") as opposed to some arbitrary string.

So, JSON-LD/ActivityStreams2 to RDFa/ActivityStreams2 is lossless in that, all of, and the same semantics can be preserved. In contrast to alternative approaches, there is no additional complexity introduced by having the system learn a new out of band vocabulary which happens to be lossy and only covers a portion of AS2. If multiple vocabularies are used in JSON-LD/RDFa, everything works as expected out of the box.

Going from RDFa to JSON-LD is exactly the same. Information with its semantics will be all intact.

csarven commented 6 years ago

@pfrazee re SimpleRDF, if foo is your graph, then foo.relationship will work if you give it a context where relationship is defined like:

Single value: "relationship": { "@id": "http://example.org/relationship", "@type": "@id" }
Multiple values: "relationship": { "@id": "http://example.org/relationship", "@type": "@id", "@array": true }

SimpleRDF wraps rdf-ext, so perhaps look under https://github.com/rdf-ext/ as well but that'd be lower-level stuff. You can pick and choose your parsers and serializers, but I think you'll mostly need to work with rdf-parser-jsonld and maybe rdf-serializer-jsonld.

And as @soyuka says, if you don't want to deal with that, you can just treat JSON-LD as plain JSON ... Needless to say, that bakes in the knowledge about what to expect and deal with in the data to your application. Nothing "wrong" with that. As long as your code is consistent in how it handles and generates data in JSON-LD, you're okay.

soyuka commented 6 years ago

Nothing "wrong" with that. As long as your code is consistent in how it handles and generates data in JSON-LD, you're okay.

Especially if you associate your own json schemas to describe the spec, validate or whatever.

kevinmarks commented 6 years ago

Your assumption that JSON is more developer accessible than HTML is a bit of a tricky one - as you have to add more conventions to overcome the visibility issues, you may find that the initial obviousness is not longer true - this reminds me of 'markdown is simpler than html' when I always have to look up the link syntax in markdown.

The 'needless boilerplate' was a concern for Activity Streams, so the spec does say you can use the properties without the context (implied by MIME type).

When a JSON-LD enabled Activity Streams 2.0 implementation encounters a JSON document identified using the " application/activity+json" MIME media type, and that document does not contain a @context property whose value includes a reference to the normative Activity Streams 2.0 JSON-LD @context definition, the implementation must assume that the normative @context definition still applies.

RDF is it's own model - there are people such as @csarven who find it helpful in their worldview, and other that find it more complex than they need. As far as schema.org goes, the way that combines object inheritance with RDF makes it much harder to understand.

As for 'just building RSS readers', RSS in practice has a huge variation of markup and choices - any tech that takes off odes tend to accrete more over time. Ultimately you are going to extract the bits you understand and skip the rest, but following some uniform naming of fields as far as possible is good, and the activity streams one has done a fair bit of analysis to do that.

cwebber commented 6 years ago

Hi @pfrazee, really happy to read this thread. As you probably know, ActivityPub (of which I'm co-editor) uses ActivityStreams, and ActivityPub is used by a growing number of social networking applications, including Mastodon. It would be great to have Dat/Beaker join us.

I've been pushing for the idea of a peer to peer ActivityPub application, and maybe Dat/Beaker is actually the right way to go there! (We intentionally made certain that ActivityPub does not require specifically the https:// uri scheme, so maybe dat:// will work very well?)

Re: RDF'ness of json-ld: the main reason for json-ld is to allow for the expressiveness of linked data without requiring people go "full RDF". It's totally possible to write an application using ActivityStreams using more naive json tooling, and we made sure that this was a requirement when we worked on ActivityStreams. In fact Mastodon is an example of an application that does support ActivityStreams as valid json-ld, but internally operates on it as more naive json. So this is more than possible.

I'd love to talk more about this with you if you'd like to. Maybe the Social Community Group would be a good place to discuss?

AlbertoElias commented 6 years ago

My only thought is that if we do go with JSON-LD, where I understand the developer friction, there should definitely be a Web API to handle it, as requiring a parser library for something that would become so common might even be what stops it from being widely used.

I like the schema flexibility and that it's supported, and I do believe that, if we do go down this route, we will converge in schemas, so it won't be a pain for newcomers, while also allowing for innovation to happen.

There's also AcitivtyPub built on ActivityStreams which Mastodon uses. Currently it's thought out to work nicely in the Client-Server model, but I think it adapts nicely to the serverless decentralized model.

0x0ade commented 6 years ago

As a newcomer, it's inspiring and fascinating to see the ongoing discussion. I just wanted to share a few things on my mind. Please ignore me if I'm being too naive / dumb, as I don't want to disrupt the ongoing discussion.

I can't see how or where I would use microformats in a P2P social feed webapp*.
dat:// websites are static and the data is stored separately - Rotonde and Fritter thus render the posts at runtime. Embedding data in the rendered HTML for SEO or similar purposes shouldn't be our concern for now.
Personally, it makes more sense to use a format that was meant for data storage and data transfer in the first place, and JSON is practically a JavaScript-native format with parsers for almost every other language.
- *: A highly unrealistic case where I would use microformats is if we were to store all posts pre-rendered in index.html and their respective post pages, making the user's feed readable in a truly static context, but this brings many other issues.
I'm currently going through the peer to peer ActivityPub documentation. It helps me in understanding ActivityStreams a lot, but...

ActivityPub was written intentionally to be layerable on any protocol that can support HTTP GET and POST verbs.

As far as I know, the dat:// protocol only supports GET (DatArchive.readFile (doc) and standard resource fetching) across the network and POST (DatArchive.writeFile (doc)) into your own archives. This would be analogous to POSTing to your own outbox.
```
- You can POST to someone's inbox to send them a message
+ You can POST to your own outbox and hope that the receiver is fetching your outbox
```
Due to that limitation, we don't even have an "inbox". Instead, we fully rely on the client to fetch any outboxes to build our feed and find any mentions. As far as I know, in Fritter, your "inbox" is composed of the people you're following. (I don't know if / how it changes for notifications.) In Rotonde, your "inbox" is composed of the people you're following and the people they're following (if you enabled "discovery").
```
- You can GET from your inbox to read your latest messages
+ You can GET from the outboxes you fetch (basic case: people you follow) to read your latest messages
```
I'd happily adopt a dat-compatible peer to peer version of ActivityPub in the future.
I'm still looking at which JSON-LD schemas / contexts to possibly support in Rotonde. ActivityStreams 2 seems like the most obvious choice, but it initially "shocked" me - I initially didn't even understand whether a Create is enough or if I need to Add it to my feed, and I'd prefer to f.e. delete a post from my WebDB instance instead of checking if it got Deleted. It adds a lot of bloat which I (as a naive newcomer) would prefer to avoid: I just want to read and write my entries, not what I'm doing with it. The dat versioning history is already there for that.

I'm sorry if anyone is now grabbing and shaking their head, but I just wanted to get this out.

pfrazee commented 6 years ago

I'm reading up on the ActivityPub vocab and JSON-LD dev ergonomics. I'm hopeful that I can make them work for us, but I'm not optimistic yet. @0x0ade your second and third bullets about the inbox/outbox divide and the superfluity of Activities like Create is something I just verbalized myself in the W3 social IRC. If there's an opportunity, it's probably to only use the Object-based schemas, and not the Activity-based schemas.

If I had to rank my requirements, they are:

Developer experience (clarity, ease of use, enjoyability)
User experience (good behaviors in all cases, good fallbacks, minimal "missing messages")
Compatibility with any existing ecosystem

Those are all requirements, so compat does matter to us, but it's less of a priority than DX and UX are.

As the ActivityPub folks have pointed out, fairly minimal conformance is all it takes to make compatibility possible. But I'd also mention that solving these questions with "Use an existing vocab" is actually not solving the question of decentralized development; it's dodging it. Vocab choice is a concern for the developer of a specific application, rather than a framework for thinking about how to design our schemas.

I'll keep considering JSON-LD and ActivityPub, but I'm primarily interested in the techniques we're going to use for managing fallback behaviors in the face of unexpected schema differences.

csarven commented 6 years ago

That's it! Start with Object, and if/when you want to express an Activity about an Object, you can extend. Activity will refer to the Object.

I agree that choosing the "right" vocabs go a long way but there is no one size fits all, and certainly not indefinitely. It not only holds true for you own application and the data is generates/consumes, but also what you are trying to achieve here across applications. Things change - data changes. Embrace that as one of the core attributes of your system, instead of getting blocked by the idea that your applications may not end up using the right vocabs, or be perfectly understood by another. There is plenty of space for interop, just as well as things getting missed.

Different applications are going to go at it differently, so you have two general approaches: 1) try to define something that will work for everyone, or 2) use the best you can for your own application and cross your fingers for interop. I think 1 doesn't really work even if we have everyone on the table. Moreover, there are applications that don't exist yet which will have different requirements than we can foresee. 2 is pragmatic, and if different applications generate data that's 80% meaningful to another application out of the box, then that's fairly successful. For concepts that's not immediately meaningful to an application, it is possible to direct the application to investigate further to see if and what other ways it could become meaningful - follow your nose type of exploration

In practice, applications cluster around vocabularies organically - due to trends, usefulness, evolvability, and so forth. That's something to keep in mind.

pfrazee commented 6 years ago

A few reads that are worth including in this discussion:

I think Robin makes an extremely good point here:

It should therefore be a core tenet of linked data that publishers should not have to think about interoperability through existing vocabularies (unless they are specifically taking part in an existing, relatively predictable data community). If the system is predicated on people thinking about reuse before they can even start publishing, then it will largely fail — especially in reaching the vast amounts of “small data” that exist in the wild.

This also very closely fits my thinking:

One of the greatest values in publishing reusable data is that you know neither who will want to use it nor how. Because of that, unless it is obvious that you're targeting a given community, the chances are that it is not worth thinking about how to fit your model into a shared one. The first order of business is to do a good job getting the data out there, and the best way of doing that is likely to simply expose something close to your own internal model (which isn't to say that you shouldn't learn from how things like yours are commonly modelled). The odds are very high that conversion will be needed no matter what, for at least some of your users (and not unlikely for most). A resilient linked data ecosystem needs to treat data conversion as a natural, core, and common part of everyday life. Munging happens. It just always does.

pfrazee commented 6 years ago

I spent the weekend preparing a solution to this issue that I'd like to propose.

My goal was to make something that won't generate too much frustration. That's a pretty high bar to clear in an opinionated space, so I hope this gets close, and I apologize if I've come short! I'm also trying to find something which the un-opinionated developer finds palatable, and that should explain the pragmatic approach of this spec.

My proposal is called JSON Lazy (JSON-LZ). Links: README.md, DESIGN.md.

I chose that name for two reasons:

As a playful nod to the JSON-LD community, with whom I want to remain compatible.
To evoke the core philosophy that compatibility should be solvable as an afterthought.

(We can rename if we're concerned it's too confusingly similar to JSON-LD.)

JSON-LD advocates should read the design doc to get a clear understanding of why I'm proposing an alternative to JLD. Please feel free to argue the points and propose alternatives. I took multiple passes at using JLD's schemas for this proposal, but didn't feel like I was getting intuitive results. My strategy instead is to remain compatible by avoiding conflicts, and that means that a JSON can use both JSON-LD and Lazy. I think we can find some simple strategies from Lazy's tooling to understand JSON-LD too, but complete support is a non-starter.

The discussion in this thread about whether to adopt ActivityStream's vocabulary has been valuable for helping to clarify this task, and I thank everybody for pitching in their thoughts. However, I think it's important to say that we should not solve this problem by adopting a single application-schema proposal. What we're attempting to solve is the process of application dev in a decentralized network, not the particular schema needs of social media applications.

Therefore, rather than asking, "Should we use ActivityStreams?" we should instead ask, "How can we make it easy for devs to add ActivityStream down the road?"

I hope to incorporate your feedback, so please let me know your thoughts and concerns.

cwebber commented 6 years ago

Arg... well, if you decide to create a new mechanism for extensible JSON, effectively that's your decision... however I'll say that I think that the JSON-LD community has worked pretty hard to sort out a lot of these things already. I think there's a big advantage to being able to share interoperability with groups like schema.org, ActivityStreams, and to be able to take advantage of tooling like linked data signatures. I'd strongly encourage you to reconsider in fragmenting this space and help us collaborate to bring unity here instead!

cwebber commented 6 years ago

Perhaps it's not a bad idea to consider joining the json-ld Community Group and raise your thoughts/concerns there? Great folks, and IME very open to discussing things.

pfrazee commented 6 years ago

@cwebber My proposal isn't a final decision, but it should anchor our discussion from here on out. Lazy is an example of what I see as decent dev ergonomics. I really hate disappointing the LD community, so if we can fix the ergonomics of JSON-LD, then I'd be certainly be happier.

That said, fragmentation in schemas must be expected in a decentralized network, so I'm not very compelled by your argument that we must adopt JSON-LD to avoid fragmentation. If JLD-based software fails to work when JLD metadata isn't present, then we should dismiss JLD as a solution.

Something to be aware of: While this thread has represented a lot of folks from the LD world (and that has been great!) I've had a parallel thread on the SSB network going, and I've talked in private with the active Beaker/Dat devs. These are the actual users of the system, and the response from them has been, "I don't like JSON-LD and I probably won't use it." I'm doing my best to square that circle.

Lazy supports the schemas from schema.org and ActivityStreams, so I'm not concerned about losing access to those bodies of work. Note: Schema.org has usage examples for Microdata, RDFa, and JSON-LD.

neauoire commented 6 years ago

Well, as expected I am in favour of the JSON-LZ format as it's the realization of my original suggestion. I am guessing that @0x0ade will equally be in favour to give this a try :)

kevinmarks commented 6 years ago

I think you have addressed an imaginary problem to avoid dealing with a real one.

Currently you have a handful of field names that don't need namespaces. You are going to have to cope with people adding arbitrary new fields to your json, because it is user editable in Beaker, and is on their filesystem to edit too.

So, debating the format of how to deal with a theoretical field name collision is architecture astronautics.

In practice, what people will likely do to embed a different format would be to add a new field name and put an object in it, rather than mix fields into your top level object. LZ adding a way to document that this has happened is somewhat plausible; relying on it being done accurately is less so . The nuances of different namespace proposals are not something I am going to comment on further.

However, your comment on ActivityStreams I do think is worth responding to, as this is a real problem -naming things in social networks to maximise interoperability and decrease confusion is important.

The discussion in this thread about whether to adopt ActivityStream's vocabulary has been valuable for helping to clarify this task, and I thank everybody for pitching in their thoughts. However, I think it's important to say that we should not solve this problem by adopting a single application-schema proposal. What we're attempting to solve is the process of application dev in a decentralized network, not the particular schema needs of social media applications.

Good. So adopt the schema for social media applications that has 10 years of real world experience in, instead of making another one ab initio.

Activity Streams grew out of the OpenSocial convergence of existing social network schemas. It was implemented by dozens of social networks, including 2 of Googles, and both MySpace and Facebook (both by Monica, in a virtuoso bit of coding and career trajectory). It's the standard format for Gnip's unified stream API, and Granary will convert silos into it. ActivityPub uses it too. You don't need to pick up any JSON-LD baggage to use the structures.

Therefore, rather than asking, "Should we use ActivityStreams?" we should instead ask, "How can we make it easy for devs to add ActivityStream down the road?"

If this isn't a problem you are that engaged by, why not adopt it and move on?

pfrazee commented 6 years ago

"Just use X schema" is quite simply the antipattern. Our role as the Beaker team is not to tell devs what schemas to use; it's to develop the tools that devs need to make their independent decisions work together.

Field name collision is always theoretical until it happens -- but that's not really what Lazy is about anyway. It's about providing optional tools, which can be used in addition to duck-typing, and which can assist with munging and help avoid common errors. There may be some better techniques and more interesting tools that we can use in/instead-of Lazy, so let's make the discussion about that. Andre Staltz and I had a productive DM discussion about it; I may reproduce some pieces from that in this issue.

As an aside -- Tara and I might still adopt ActivityStreams for the Fritter app, because that'll be our choice as the devs of that app. That's orthogonal to this broader discussion though, so let's please move past it.

kevinmarks commented 6 years ago

I was looking at this from the Fritter PoV, but I do see that this is a Beaker issue.

That said, reusing a relevant schema is not an antipattern. Making up a new one without reference to existing work is a bigger antipattern. It's a tempting one, which is one reason we wrote down the microformats process to stop ourselves from doing that.

(We also wrote namespaces considered harmful with evidence).

I am a big fan of duck typing and munging - I have the domain unmung.com after all.

cwebber commented 6 years ago

Field name collision is always theoretical until it happens -- but that's not really what Lazy is about anyway. It's about providing optional tools, which can be used in addition to duck-typing, and which can assist with munging and help avoid common errors.

This indeed sounds like it's going to go down much of the path of where JSON-LD has gone... and possibly a lot of history repeated in doing so.

But notably you don't have to require that users use JSON-LD tooling in order to work with JSON-LD. ActivityPub gets around this by requiring that servers parse incoming JSON with an implied context of ActivityStreams, and mandate that the JSON be transmitted in compacted form... so even if a server does the wrong thing, the robustness principle applies. You don't have to... and the vast majority of ActivityPub users don't... treat the JSON tooling as linked data. But it's there if you need/want it, and there's an extension mechanism built-in that has years of careful thinking and work to build something easy to use and interoperable.

I suspect this is the last I'll have to say on the topic... you don't have to listen to me, but I think if you go down the JSON-LZ route, it won't be as lazy as you think, because now you've got one more thing to get interop on where interop already exists :)

pfrazee commented 6 years ago

@kevinmarks Ok I think I understand your perspective better now. To be clear I certainly think that devs should reuse each others' schemas, but my point is that if that's the only solution we can offer to making compat happen, then it's not really improving things. The community will have a natural incentive to "just use X;" what I'm trying to solve is, how can we improve things once there's an active X and Y and everything in between. I'm going to spend some time from here making sure I've very clearly enumerated the issues we're trying to avoid, rather than develop Lazy further with any assumptions.

@cwebber Appreciate the input. It's great that JSON-LD tooling only has to kick in when extensions come about, but my goal is to make extensions commonplace, and the root of the problem is that LD's dev experience isn't great for JS devs once extension is needed. I hope it doesn't feel overly critical when I say, I haven't really heard a great answer to that concern!

The search for first principles continues.

pfrazee commented 6 years ago

I am finding http://microformats.org/wiki/namespaces-considered-harmful useful, thanks for linking that.

msporny commented 6 years ago

@pfrazee wrote:

the root of the problem is that LD's dev experience isn't great for JS devs once extension is needed

Then let's make it better.

Hi @pfrazee, I'm one of the creators and primary spec editor for JSON-LD 1.0.

I've been watching this discussion from the sidelines until I understood the issues in more depth. After reading the tweets, your JSON-LZ proposal, and this thread, I'm still having trouble understanding some of the arguments. I think some of this may be easy, the rest requires a deeper conversation about the last 10 years of lessons learned that have gone into JSON-LD.

I also wanted to make you aware that we run multiple companies that use JSON-LD at their core, some of them working on decentralized software and peer-to-peer networks that need decentralized extensibility as well. So, we're working in the same sort of problem domains, so it's always nice to connect with others that are trying to create a more decentralized world. :)

Would you be willing to get on the phone w/ the Editors of JSON-LD and the developers of jsonld.js? We're in the process of spinning up a new W3C Working Group for JSON-LD 1.1, so we could address some of your concerns in a new version of JSON-LD 1.1, or we could update/modify jsonld.js to make the developer ergonomics better. There are several "tricks of the trade" that I haven't seen mentioned here, so I'd like to make sure you're aware of them and are hooked into the folks that know this stuff at depth. In other words, I want to make sure the JSON-LD Community is supporting you as best we can. We'd use the JSON-LD CG telecon bridge, invite others, and make it a public call that anyone may participate in.

What's your availability for a call?

/cc @gkellogg @dlongley @cwebber @bigbluehat

BigBlueHat commented 6 years ago

@pfrazee you won't get interop by starting a new thing, and "developer ergonomics" are not enhanced by bringing more strikingly similar things in existence. Please reconsider JSON-LZ (especially it's name).

Beaker Browser has great promise in helping communities (education, publishing, science, etc) that already traffic heavily in RDF-based formats. Be careful not to lock yourself (and Beaker Browser) out of your own opportunities.

NIH is not your friend, but I'm happy to be. 😄

(ugh...cache invalidation... I didn't see @msporny's post until after I posted mine. 😛 Huge 👍 to everything he said. We'd love to have your help in JSON-LanD 😁)

taravancil commented 6 years ago

I really appreciate y’all jumping in here. It’s been helpful.

@bigbluehat I want to clarify that this discussion is not about Beaker adopting JSON-LD or not, but rather trying to anticipate the needs of devs building things on top of Beaker.

Devs building websites/apps for Beaker are starting to need some of what JSON-LD provides, but they’re unhappy with the experience of using JSON-LD, and have told us so (sometimes passionately). This discussion is about how to square that.

Ultimately this isn’t the Beaker team’s decision—we’re not considering first-class support for JSON-LD or anything similar, and devs will always be able to choose whatever works best for them, so we aren’t worried about locking Beaker out of opportunities by having this discussion. We’re simply responding to our observation that real-life applications have a need for some of what JSON-LD offers, yet devs don’t seem interested in adopting it.

Mon, Jan 29, 2018 at 08:25 BigBlueHat notifications@github.com wrote:

@pfrazee https://github.com/pfrazee you won't get interop by starting a new thing, and "developer ergonomics" are not enhanced by bringing more strikingly similar things in existence. Please reconsider JSON-LZ (especially it's name).

Beaker Browser has great promise in helping communities (education, publishing, science, etc) that already traffic heavily in RDF-based formats. Be careful not to lock yourself (and Beaker Browser) out of your own opportunities.

NIH is not your friend, but I'm happy to be. 😄

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/beakerbrowser/beaker/issues/820#issuecomment-361261066, or mute the thread https://github.com/notifications/unsubscribe-auth/AHO8QeBgaGGDkM07OT11OMZt16gNBVmmks5tPdTbgaJpZM4Rdt0x .

pfrazee commented 6 years ago

@msporny Tara and I are going to be in SF starting this Wednesday till Sunday. If you're in the area, we'd be happy to meet for coffee or drinks sometime. Otherwise, we can do a call during the daytime next week. You can email or DM me on Twitter if you like (pfrazee@gmail.com)

msporny commented 6 years ago

@pfrazee wrote:

Otherwise, we can do a call during the daytime next week.

We're not in the Bay Area, so let's do a call next week... I'll send you an email. @taravancil -- mind connecting w/ me via email so I can include you in the invite: msporny@digitalbazaar.com ?

neuroplastic commented 6 years ago

@bigbluehat is right: critical knowledge communities already use RDF-based formats. The Indieweb and JSON-LD teams are right: please no more fragmentation. But Beaker's back-and-forth with Indieweb and JSON-LD teams seems to be missing something:

We're all following the @timbl of 1993. Why don't we follow the @timBL of 2017?

SOLID addresses the Indie and decentralised web missions, plus re-establishes the open semantic web. Before Beaker brought back that old NCSA Mosaic tingle, @csarven's Dokieli did.

However, SOLID, like Indieweb, is still stuck at a developer-level UX. But Beaker's decentralised profiles -> pods can solve this.

I do not want to see Beaker being held back by the very things holding SOLID and Indieweb back. I think Beaker should be able to try new formats as needed, IF it results in being able to do SOLID without a server.

I want the openness of the early web, WITH the convergent evolution of RDF, ML, and node. I suspect the solution to the present debate is to move the LD part of JSON-LD into JS. HTML is moving from index.html to index.js, so let index.js handle the RDFa. (RFC @yoshuawuyts ?)

I want to see Beaker being pulled forward by SOLID, not pushed forward by Rotonde.

BigBlueHat commented 6 years ago

@neuroplastic excellent thoughts. It's probably past time we all got back to work on http://rdf.js.org/ and friends. Thanks for the push!

yoshuawuyts commented 6 years ago

@neuroplastic sorry, I don't think I'm following. What is it you're asking?

neuroplastic commented 6 years ago

@yoshuawuyts, @pfrazee and @taravancil are making a push right now against the UX barrier that has kept the original vision of the ReadWriteWeb (and the semantic web) from succeeding. In your, @dominictarr's, et alia's work, we see a modular approach to handling HTML from within node.js javascript. As e.g. @jondashkyle demonstrates, this can be a simpler UX than direct HTML wrangling.

@pfrazee is identifying JSON-LD as a blocker for usability of the p2p RWW. But JSON-LD is a critical piece of the puzzle to enable scientific and general data integration in an open web. This issue is an argument that can potentially make or break JSON-LD, and hence, SOLID. We can't let LD go, not when @TimBL is this close with SOLID. (If it helps, young hurried devs, think of the LD as JSON-Leibniz&Diderot).

The elements in JSON-LD @pfrazee is calling attention to - @context and type - are so epistemically freighted, no wonder there's been no usable solution, hence the tenson in this issue's thread. The web needs a breakthrough here, and it needs it now.

@msporny, @kevinmarks, @timbl, @csarven, @rubenverborgh: are you ready to agree there is no declarative solution to the usability problem of RDF formats?

If so, then the solution must be that an application is already a set of contexts and types. The same JSON must be ingestible by different apps able to apply different LD contexts and types. This is where i see @pfrazee leading, but, critically, IndieWeb and SOLID are stuck in the old app paradigm. For SOLID and the RWW to work, JSON-LD needs new imperative/functional tooling, and the radical simplection happening in choo, dat, and e.g. hyperscript is the place.

@yoshuawuyts, can we rebuild SOLID using choo, Dat, and Beaker?

BigBlueHat commented 6 years ago

@neuroplastic there's loads of great thoughts (and concerns) tucked in both your comments...but I'm not sure they're addressable in a single issue on the GitHubz. Maybe you could kick off a few more narrowly targeted and/or actionable requests to various mailing lists and projects?

Most of the people here are working toward what you seem to be hoping for, but it is (as ever) going to take time, effort, and (above all) collaboration.

It looks like you've got enough knowledge of the space (and the "actors" in it) to write-up some pretty focused proposals. Maybe toss up some GitHub repos, Gists, or wiki pages some place, and pass those around to various WGs. File clearly closable issues (where possble), and kick off some experiments. Any and all of these things could help move the world forward.

Cheers! 🎩

neuroplastic commented 6 years ago

@BigBlueHat, sorry to thread-crash. I do have FOSS contribs. along these lines in the works. But Dat, Beaker, and choo are moving orthogonally to established patterns in web standards, which is creating an opening for the kind of unexpected move with which @TimBL started all this ca. 1993. But this thread also brings up the fear that XKCD#927 ('How standards proliferate') will happen AGAIN, for the ten-thousandth time since Dan Connolly brought SGML to the W3C in 1995.

I spoke out of turn, but i wanted to throw a spotlight on how close @TimBL's vision is, IF Indieweb and SOLID will support the lateral step Dat, Beaker, and choo are taking, and IF they in turn will support the lower-case semweb. Don't let all that is SOLID melt into air.

@BigBlueHat, thanks for being a kind and helpful gatekeeper.

pfrazee commented 6 years ago

Just to update on this, we've got a phone-call scheduled with @msporny to discuss this more. We'll keep everyone updated!

dominictarr commented 6 years ago

think of the LD as JSON-Leibniz&Diderot

@neuroplastic I am now more confused

msporny commented 6 years ago

The call is at 3pm ET today, dial-in details are here. Anyone is welcome to join, the call is open to the public.

https://lists.w3.org/Archives/Public/public-linked-json/2018Feb/0003.html

neuroplastic commented 6 years ago

@dominictarr Here's the spec for JSON-Leibniz&Diderot:

https://en.wikipedia.org/wiki/Characteristica_universalis https://en.wikipedia.org/wiki/Denis_Diderot#Encyclop%C3%A9die

.. Nailing JSON-LD for beaker and p2p may be the difference between freedom from facebook, vs. freedom from facebook and freedom from an academic publishing and ranking system that needs to go just as badly.

webdesserts commented 6 years ago

For interested parties, here are the meeting notes for the above call.

pfrazee commented 6 years ago

I've found one notable difference between how I was thinking with Lazy and how LD works.

In LD, the goal is to expand every key into a full URL. So, eg, the compact form {"name": "Paul"} can expand to {"https://schema.org/name": "Paul"}.
In Lazy, the goal is to categorize every key under a vocab ID. So, eg, we know the name in {"name": "Paul"} belongs to the schema.org vocab.

So, Lazy focuses on vocabularies as a group, while LD focuses on individual attributes.

Why does this matter? Lazy has 2 goals: to help you detect schema incompatibilities, and to help you transform between schemas. By thinking in terms of vocabs, Lazy is able to do a sort of "support detection exchange," where the JSON object declares its vocab IDs and which ones are required, and the app declares the same (but for itself) and we walk away knowing about the support. (See Detecting schema support.)

If we're going to make the same concept work for LD, we'll need to find a way to use the common root in attribute IRIs as vocab IDs. For example, consider the following object:

{
  "@context": {
    "name": "http://xmlns.com/foaf/0.1/name",
    "homepage": {
      "@id": "http://xmlns.com/foaf/0.1/homepage",
      "@type": "@id"
    }
  },
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/"
}

We'd need to somehow determine that "http://xmlns.com/foaf/0.1/" is the vocab ID.

beakerbrowser / beaker

Application data schemas & how to manage decentralized development #820