braid-org / braid-spec

Working area for Braid extensions to HTTP
https://braid.org
233 stars 16 forks source link

Linked JSON: Why `link`? #78

Open balupton opened 3 years ago

balupton commented 3 years ago

My concern with link is that:

  1. It is not obvious to the uninitiated that it is anything special at all.
  2. When one sees link with application/json instead of application/linked-json, they have to wonder if that was a bug? Such as this situation here: https://github.com/braid-work/braid-spec/blob/bf179d8c84ea48b5cb1ea4faf23c6ea9bfc28416/draft-toomim-httpbis-braid-http-03.txt#L289-L290
  3. Converting between JSON and Linked JSON requires a converter, in case any JSON documents had a link field in them.

When reading the cited benefits of $ref. One could do $link, to denote to the reader that it is meant to be something special, while enabling escaping via $$link or \\$link; with \\$link being something intuitive to the ECMAScript world, as is done in such cases as constructing regular expressions with strings new RegExp(".\\..") (as opposed to /.\../) to look for three characters, the middle one dot.

canadaduane commented 3 years ago

I agree with this, and prefer $link over link. Here is another spec that uses _link in precisely the opposite way that linked-json currently would use _link (i.e it uses _link as the special meaning, not to escape out of link): http://stateless.co/hal_specification.html

toomim commented 3 years ago

Thank you for these comments!

However, I actually do prefer the basic link myself. I was aware of hal when I wrote this spec, and found the $ to be unnecessary. Since links are so common in JSON, I wanted a simpler, more readable and easier-to-type alternative.

It is true that we programmers often prefix special variables with "unusual characters" so that they stand out. However, in this spec, the only thing special is the field link itself. There are no other special fields, so we don't need any special characters. If we declared $ to be special, then that would imply that there are other special fields that start with $. But there are none. The only special field is link.

I also acknowledge Ben's concern that:

It is not obvious to the uninitiated that it is anything special at all.

It's true that uninitiated programmers won't know that there is anything special about links. However, the uninitiated programmer will not understand most fields in a JSON object they encounter. I don't see a link as being any different in this respect. Can you give an example where it's particularly dangerous for an uninitiated programmer to encounter a link vs. an author, for instance? In either case, if you edit a JSON field without knowing what it does, you could cause an error.

Second, I want to call attention to the longer-term costs of cluttering the syntax that we use in the basic web platform. It takes only a few seconds to learn that link is special, but on the other hand a web programmer is likely to spend many hours of their life staring at JSON strings with links in them. Application JSON tends to include many links, and it becomes visually noisy if each link has a $ in front of it. Since we're inventing the future here, I'd like to make it clean and beautiful so that the billions of programmers in the future working with it have a nicer environment to work within.

toomim commented 3 years ago

When one sees link with application/json instead of application/linked-json, they have to wonder if that was a bug? Such as this situation here:

Indeed, that example would probably be better as application/linked-json. But it's also true that linked-json is still valid json. Whether it's a bug depends on the application. Maybe we need a specific example here?

Converting between JSON and Linked JSON requires a converter, in case any JSON documents had a link field in them.

I wouldn't say that it requires a converter, as Linked JSON is still valid JSON. Perhaps you are thinking about a specific case where you want to inline the links into the JSON document? I'm not sure what the issue is here, in any case.

mitar commented 3 years ago

I appreciate your comments @toomim, but I also share that $link would be better. It is pretty common in JSON schema and other places to have things like $ref, $id. $link seems similar. It is really that it is quick to read about it, but it is also important to know that you have to read about it at all. When looking at author you see that it is a normal key-value pair, while at $link you see, oh, this looks special, let's see what it is about.

I would ask again that I am unsure why we are doing one more standard and not simply have JSON-LD, but that is a separate discussion I will not reopen here. :-) To me it looks like Linked JSON is just a special case of JSON-LD (with link being what is generally @id in JSON-LD). See JSON-LD framing spec.

canadaduane commented 3 years ago

@toomim Here's another consideration that I think is important: the most common use of JSON is as a serialization of data. So for example, say you have a "LinkedItem" class in some Javascript code (or Python, Ruby etc.), and you want to serialize it. The most natural thing to do is to lean on existing tools, such as JSON.stringify. The expectation that a software developer has of a serialized data object is that there is "nothing special" about its serialized form:

class LinkedItem {
  constructor(id, data, link) {
    this.id = id;
    this.data = data;
    this.link = link;
  }
}

const n1 = new LinkedItem(1, "hi", null);
const n2 = new LinkedItem(2, "ok", 1);

console.log("n1", JSON.stringify(n1));
/* {"id":1,"data":"hi","link":null} */

console.log("n2", JSON.stringify(n2));
/* {"id":2,"data":"ok","link":1} */

Since "link" is a common key, it seems like Linked JSON has a bad surprise in store for the developer: Do what you usually do when serializing data, but every once in a while, the semantics of your serialized data will not match your intention.

We can greatly reduce the likelihood of this "bad surprise" by adding a special character prefix, since it never (almost never?) occurs that a data object's keys start with a $ without some kind of special meaning intended.

toomim commented 3 years ago

@mitar and @canadaduane, I think you have in mind use-cases that this spec is not designed for.

This is not a serialization format for application data structures. This is a language for annotating hypermedia resources with links. In other words, this is a Hypermedia JSON with Transclusion. It is appropriate only for data that tries to model Transclusions. If you want to serialize data structures, use JSON. Linked JSON is not for serialization. If you want to program with linked hypermedia resources, then Linked JSON is useful.

Linked JSON is a language, like JSON or Javascript. When you use Linked JSON, you'll know you are using it, and will know what link means, just like you know what window.location or module.exports or Math.random() means in Javascript. But since all valid Linked JSON is also valid JSON, you can use JSON parsers and serializers to read and write it to disk.

The JSON-LD language is designed with different goals, and has design flaws for a general-purpose programming. JSON-LD does not support arrays, for instance: [1, 2, 3] is interpreted as an unordered set. Programmers often care about the order of their arrays. And JSON-LD is very complex. Linked JSON takes only 4 paragraphs to define the entire spec. It's just JSON with links. It's not RDF.

Duane: Linked JSON actually causes no bug in your example above, because it is doesn't impact serialization. You can serialize and parse with the regular JSON.serialize() and JSON.parse() functions.

This spec is only for the people who want to use it. We cannot force people to use a spec. That is not what the IETF does. The IETF is a space for people who want to interoperate to find consensus on how to do so. What use-cases are you wanting to make interoperable? I'll list some of mine below.

Linked JSON is useful for building development tools. For instance, it lets you implement a JSON viewer that lets you click on a link to load it inline, because your tools will know which things are links. But an even cooler feature IMO is in Statebus— it can transclude links transparently as you access them in Javascript, loading them like variables behind the scenes over the network. For example, if you have this data structure:

# /posts
[
  {link: "/post/3"},
  ..
]

# /post/3
{
  author: {link: "/user/john"},
  title: "Hi mom!",
  body: "I am really excited to see you!"
}

# /user/john
{
  name: "Johnny Appleseed",
  job_title: "Inventor",
  pic: {link: "/images/seeds.jpg"}
}

Then Statebus lets you read and write data across networked links transparently, like this:

  state["/posts"][0].name = "The Rock"

This single line traversed 3 networked resources! This is soooo awesome to program with. Your network totally disappears! (Under the hood, Statebus uses ES6 Proxies and reactive functions. And be aware this feature is still marked "experimental" in Statebus.) This is what Linked JSON enables.

Finally, I want to point out that $ doesn't really protect you like you might think it does. Perhaps I could restate the hypothesized problem and solution as someone accidentally naming a field in a data structure "link", and the concern would be that "something unexpected happens." (I'll leave aside for a moment that we don't have a precise statement of an actual problem that occurs.) The proposed solution is to escape "link" with $. That might sound nice at first, but it isn't complete, because we still need to escape $, since someone might need to encode $link. For that, the proposal is to introduce a new mode of escaping, with backslashes \. However, \ is already an escape character in strings, so when we write it down we have to escape it with another \, and now we have two \\ characters and a $ character, because we're trying to escape an escape character with an escape character.

This is unnecessary. This is three layers of escaping. What an ugly syntax.

I've been writing Linked JSON code for 6 years or so (except we use the term key in Statebus), and in all that time, I haven't observed a programmer forgetting that key or link is a special word. The problem we actually run into is that sometimes (frequently?) we need to store a hash table in JSON, with user-generated input. If we don't escape those things programmatically, we open up to a Cross-Site Linking attack, e.g. a user can name themselves link or $link, and break your site.

So if you use Linked JSON, we need to recommend using an abstraction that automatically escapes links to prevent Cross-Site Linking; just like HTML views escape html to prevent Cross-Site Scripting. Linked JSON can be escaped an unescaped with 13 lines of Javascript:

var escape_links = json =>
    recurse(json, k => (k === 'link' || k[0] === '_') ? `_${k}` : k)

var unescape_links = json =>
    recurse(json, k => k[0] === '_' ? k.substr(1) : k)

var recurse = (json, f) => {
    // Recurse on arrays
    if (Array.isArray(json))
        return json.map((x) => recurse(x, f))

    // Escape links in objects
    if (typeof json === 'object')
        return Object.fromEntries(Object.entries(json).map(
            ([k, v]) => [f(k), recurse(v, f)]
        ))

    // Else it's just an atom
    return json
}

If we used JSON-LD this escaping/unescaping code would be much more complex.

In the long run I suggest implementing the spec to get experience:

An unofficial motto of the IETF is, "We believe in rough consensus and running code." Implementation experience provides critical feedback to the standardization process.
https://www.ietf.org/how/runningcode/

canadaduane commented 3 years ago

This is not a serialization format for application data structures. In other words, this is a Hypermedia JSON with Transclusion. It is appropriate only for data that tries to model Transclusions. If you want to serialize data structures, use JSON. Linked JSON is not for serialization. If you want to program with linked hypermedia resources, then Linked JSON is useful.

I think I can articulate the problem here, and show why it is not desirable to have two formats that are indistinguishable except for a header:

  1. You expect that Linked JSON will only exist inside an HTTP message that has a clear header denoting its distinction as Linked JSON.

Here are some examples of JSON (or is it Linked JSON?) in other code bases. These developers have some assumptions around what "link" means, but the only context they have provided to a casual observer is in the semantics of their native languages or the structure & keys of their JSON:

  1. places.json

    "P001":
    {
        "name": "CÁC HANG ĐỘNG PHẬT GIÁO ",
        "name-full": "CÁC HANG ĐỘNG PHẬT GIÁO AJANTA, ẤN ĐỘ",
        "upload-date": "2018-09-01",
        "location": "Ấn Độ",
        "location-related": ["Ấn Độ"],
        "img": ["img/p001.jpg"],
        "link": "html/p001.html",
        "link-map": "html/p001-map.html",
        "link-view": "html/p001-view.html"
    },
    ...
  2. file_tree.json

    [
    {
    "subpages": [
      {
        "subpages": [
          {
            "subpages": [], 
            "link": "zevach_hekdesh_acher.htm", 
            "name": "acher.htm"
          }, 
          {
            "subpages": [], 
            "link": "zevach_hekdesh_shofaros.htm", 
            "name": "shofaros.htm"
          }, 
          ...
  3. links.json

    {
    "accointing": {
    "link": "https://www.accointing.com/",
    "name": "Accointing"
    },
    "almonit": {
    "link": "http://almonit.eth.link/",
    "name": "Almonit"
    },
    ...
  4. quiz.json

    var games = [{
    "nome": "img1",
    "link": "public/imgs/jogos/amg.png",
    "resposta": "res1",
    "res1": "Among Us",
    "res2": "Sonic",
    "res3": "Fall Guys",
    "res4": "Pummel Party"
    }, {
    "nome": "img2",
    "link": "public/imgs/jogos/cofc.png",
    "resposta": "res2",
    "res1": "Clash Royale",
    "res2": "Clash Of Clans",
    "res3": "Age Of Empires",
    "res4": "HearthStone"
    }, {
    ...
  5. exchanges.json

    {
    "exchanges": [
    {
      "name": "Bittrex",
      "link": "https://bittrex.com/"
    },
    {
      "name": "Cex.io",
      "link": "https://cex.io"
    },
  6. pix.json

    [
    { "link": "app/img/capture2.png", "tags": [{"tag": "procedural generation"}, {"tag": "maze"}] }, 
    { "link": "app/img/capture3.png", "tags": [{"tag": "procedural generation"}, {"tag": "unreal engine"}]  }, 
    { "link": "app/img/capture4.png", "tags": [{"tag": "procedural generation"}, {"tag": "unreal engine"}]  }, 
    { "link": "app/img/daily_screen_grab.png", "tags": [{"tag": "substance designer"}, {"tag": "environment texture"}] },
        ...
toomim commented 3 years ago

I'm sorry Duane, but I still don't see what the problem is. I see a bunch of JSON snippets where the word "link" appears, but I don't see an articulated scenario where they cause a problem for a user or programmer. Are you imagining that these programmers accidentally serve their data with Content-Type: application/linked-json instead of Content-Type: application/json? Why would they get it wrong? And then even if they did, what would be the problem with doing so? All it does it give a client a hint that those link fields are hyperlinks... and it looks like they are! So it might even be able to do something useful in this hypothetical case of a misconfiguration.

Anyway, are you thinking about a misconfiguration? Or something else?

canadaduane commented 3 years ago

I'm advocating for human learning. These are examples of JSON existing outside of the context of an HTTP message. Human beings browser these repositories and learn from each other. Having multiple "indicators" of meaning--or at least tip-offs that trigger one to question one's assumptions when viewing code--is helpful to humans.

toomim commented 3 years ago

We should probably be wary of bikeshedding here.

I care about human learning too, but I don't see any clear evidence that $link is better than link for learning that a field is a link. Nor do I see any concrete bugs caused by link.

If we kept going, I'd argue that link is probably better than $link for learning and bug prevention because it's simpler to read, and doesn't imply that there is something special about dollar signs that the reader doesn't know about. For instance, if the user saw $name, they might think it's special in the same way as $link, but it's not. They might escape any user-inputted field that begins with a $, and then expect a Linked JSON parser to unescape it, but that won't happen, and this can lead to bugs where field names become corrupted.

But these arguments could go back and forth forever, and in the end we are just arguing over variable naming conventions.

canadaduane commented 3 years ago

I give up.

On Fri, Feb 26, 2021 at 12:19 AM Michael Toomim notifications@github.com wrote:

We should probably be wary of bikeshedding https://www.urbandictionary.com/define.php?term=bikeshedding here.

I care about human learning too, but I don't see any clear evidence that $link is better than link for learning that a field is a link. Nor do I see any concrete bugs caused by link.

If we kept going, I'd argue that link is probably better than $link for learning because it's simpler to read, and doesn't imply that there is something special about dollar signs that the reader doesn't know about. For instance, if the user saw $name, they might think it's special in the same way as $link, but it's not. They might escape any user-inputted field that begins with a $, and then expect a Linked JSON parser to unescape it, but that won't happen, and this can lead to bugs where field names become corrupted.

But these arguments could go back and forth forever, and in the end we are just arguing over variable naming conventions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/braid-org/braid-spec/issues/78#issuecomment-786462426, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAABAIV5EXBXBKSMKPDQRLTA5DORANCNFSM4WI2TRTA .

canadaduane commented 3 years ago

Mike & I spoke offline. We agree this is less important than other things right now. Our style is different, and may remain so. But if there's something new that comes up in future around this then it may be worth my revisiting it. In the meantime, we'll focus on other more pressing parts of the spec. I was also concerned that the parameters for "winning an argument" are unclear, and Mike will post some guidelines about how, in general, we can reach decisions via rough consensus (this will likely reflect IETF guidelines).

toomim commented 3 years ago

Thanks Duane. I've been taking some notes on the IETF process for for consensus here: https://braid.org/consensus

Some of the linked documents describe what to do when people get stuck in arguments.