SWI-Prolog / packages-mqi

Machine Query Interface
18 stars 6 forks source link

Consider adding an option to use a different JSON Format #4

Open EricZinda opened 3 years ago

EricZinda commented 3 years ago

From the swi-prolog boards

Note that the system is using the builtin term_to_json/2 since that is what is used by the http package already.

Prolog type | Prolog representation | JSON representation | Notes -- | -- | -- | -- integer | 1234 | 1234 | 1 float | 1234.0 | 1234.0 | 1 atom | '1234' | "1234" | 2 variable | _1234 | {"_":"_1234"} |   string | "1234" | {"\"":"1234"} |   list | `1234` | [49,50,51,52] |   compound | foo(12,34) | {"foo":[12,34]} | 3 dict | foo{12:34} | {"{":["foo",{12:34}]} |   improper list | [1,2,3\|4] | {"[":[[1,2,3],4]} |   unrepresentable | '['([1,2,3],4) | {"'":["compound","[",[1,2,3],4]} | 4

[1] If it fits in a pointer-size signed integer (or in a double). Otherwise translate to an “unrepresentable” functor, because many JSON libraries (including, of course, Javascript itself) have limitations about the size of numbers they’ll read.
[2] Between atoms and SWI strings, atoms get a lot more use, so they get to use the bare JSON string representation.
[3] Any compound whose functor can be represented as an unquoted atom will be represented this way. Otherwise wrap with T0 =.. L, T1 =.. ['\'',compound|L].
[4] This can be used for any unrepresentable type, or any value outside the representable domain of one of the other above types.

This keeps the representation compact (not actually a small consideration, if you’re transferring large data sets), makes it more readable (functor names before arguments), and gives every value (modulo dicts, whose keys can be reordered, and floats which are weird) a single canonical, invariant representation - which means I don’t even need a JSON library, if my results are predictable. Thus:

true([[threads(language_server1_conn2_comm, language_server1_conn2_goal)]])
% Original:
{"args": [[[{"args": ["language_server1_conn2_comm", "language_server1_conn2_goal"], "functor":"threads"}]]], "functor":"true"}
% New:
{"true": [[[{"threads": ["language_server1_conn2_comm", "language_server1_conn2_goal"]}]]]}
EricZinda commented 2 years ago

Coming back to this and summarizing: The proposed JSON format is:

Special keys: ": string _: variable {: dict [: bad list ': unrepresentable

Some things I'd like to fix if possible:

JanWielemaker commented 2 years ago

Designing something like this is IMO the way to go. JSON seems great it exchanging data that satisfies a nice and well defined object schema. Trying to represent another dynamic data representation isn't great as one typically end up with a {"type": "...", ...} which is indeed a little verbose. As is though, it gets hard to identify for example a compound. That is an object with a single key that is not a reserved key (I think). Also, some stuff gets ambiguous, which is why we get an unrepresentable category. What about this.

It only uses JSON objects where needed. There are two corner cases that we can handle special: cyclic terms can be e.g., {"t":"ct", "v":Skeleton, "u": Unifications} and terms with attributed variables. We can use copy term and represent hthis as {"t":"av", "v":Term, "c":ListOfConstraints}

P.s. I consider json(...) old school.

This can (if I didn't make a mistake) represent any Prolog term and is almost non-ambiguous. It allows for representing normal application data quite nicely (numbers, strings and json{...} dicts). Only the stuff that doesn't fit JSON naturally gets an object. Some problems remain:

The true/false/null issue can be avoided mapping Prolog strings to JSON strings, the reserved atoms to themselves and other atoms to {"t":"a", "v":String}. This comes with its own problems because SWI-Prolog introduced strings when atoms already got commonly used for what strings where intended to do. ECLiPSe introduced them early and that community has less of a problem.

EricZinda commented 2 years ago

I haven't exhaustively reviewed your list of cases, but I agree with the ones you wrote down and that we should go with that general approach.

A couple of questions here @JanWielemaker:

  1. Are you imagining that this is implemented as a new predicate, let's call it term_to_json2/2, that takes a Prolog term and converts it to the json/1 format and then the caller users json_write or json_write_dict with that output? Or is this a new version of json_write that takes any Prolog term, something like json_write_term(+Stream, +Term, +Options)?
  2. I really like mapping atoms to be to JSON strings (and Prolog strings to a JSON object). Seems like the more common case.
  3. I wonder if there should be an option that represents both atoms and strings as JSON strings. Yes, this won't "round trip", but for the way MQI is designed right now (Prolog gets set to MQI, JSON gets returned), I suspect that most people know where the atoms and strings are in their interface and making one be a JSON object just adds more work for the user...
  4. What did you mean that you consider json(...) old school? What is the new school?

Format Thoughts

[Side Note: Pengines needs support for outputting arbitrary JSON since it is designed to support integration into existing/arbitrary systems and needs that flexibility. I'd argue that MQI can make due with one, good canonical representation since it is designed to be an implementation detail of an existing application (in that sense it is like an Application Binary Interface (ABI)) I.e. we don't need to worry too much about the user being able to control the format (although pengines does!) just having a single good one. That said, I could imagine, as we've discussed, having support in MQI for different wire formats over time but for different reasons than pengines: things like performance. Maybe it amounts to the same thing but coming at it from different perspectives.]

It seems like we have some choices for the JSON object format, and I think any could work but some are more readable. I'll illustrate for the compound case of foo(arg1, arg2) and a string "a string":

  1. Updated Original Proposal: Agree we need a way to quickly identify a compound. I think we could say that all the "reserved keys" must start with "$" to make that easy. To represent a compound that just happens to be a reserved key, the caller could surround with quotes (i.e. use "'$s'"). We could use your updated list of keys as the "reserved keys": compound: {"foo": ["arg1", "arg2"]} compound with name == reserved key "$s": {"'$s'": ["arg1", "arg2"]} string: {"$s": "a string"} Python check first argument: if myjson["foo"][0] == "arg1": Python check for compound: if first(iter(myjson))[0] != "$":

  2. Your proposal above: compound: {"t": "t", "f": "foo", "a":["arg1", "arg2"]} string: {"t":"s", "v":"a string"} Python check first argument: if myjson["f"] == "foo" and myjson["a"][0] == "arg1": Python check for compound: if myjson["t"] == "t":

  3. (just an idea, maybe a little more readable?) Modified version of your proposal that uses position in a list instead of introducing keys: compound: {"t": ["foo", ["arg1", "arg2"]]} string: {"s":"a string"} Python check first argument: if myjson["t"][0] == "foo" and myjson["t"][1][0] == "arg1":

  4. (another idea) Modified version of your proposal that just uses a list (no keys at all) and position in a list instead of any keys: compound: ["t", "foo", ["arg1", "arg2"]] string: ["s", "a string"] Python check first argument: if myjson[0] == "t" and mjson[1] == "foo" and myjson[2][0] == "arg1":

My vote is for #1:

true, false, null

EricZinda commented 2 years ago

Some notes after a discussion about the above topics:

Format

We agreed that the best approach would be a merger of option 1 and 2 above, preserving the best of both:

The full, round-trippable JSON format would (perhaps optionally) include, in each JSON dict, a type key named "$t" that indicates the type of the Prolog term. The names of and number of the rest of the keys would depend on the type. Including the "$t" argument allows for round tripping. For example:

[Note that whether atoms or strings get to be the default JSON string can be switched as an option]

atom <-> string "a string": {"$t": "s", "v": "a string"} integer <-> integer float <-> float list <-> list foo(arg1, arg2): {"$t": "t", "foo": ["arg1", "arg2"]}

This approach allows the non-Prolog client to use a really nice interface to access terms if their structure is known ahead of time (which is often the case) like this, for example, in Python:

# term = {"$t": "t", "foo": ["arg1", "arg2"]}
print(term["foo"][0])

arg1

As mentioned above, this does mean that the term_to_json/json_to_term is defining a particular canonical JSON "schema" that must be conformed to. As discussed in previous posts above, this is OK since these predicates are intended to be used as an interface or an ABI, not as a general purpose generator of arbitrary JSON documents. There are other predicates that allow building arbitrary JSON in SWI Prolog.

Other Notes

Phase 1:

Phase 2:

JanWielemaker commented 2 years ago

This issue has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/wiki-discussion-swi-prolog-in-the-browser-using-wasm/5651/75

EricGT commented 2 years ago

The context in which this is written. Came here because of (ref)

I have been discussing a new JSON format with @ericzinda at Consider adding an option to use a different JSON Format · Issue #4 · SWI-Prolog/packages-mqi · GitHub 1

so my mind set is that this will/could become a/the new JSON package for SWI-Prolog and I often use JSON so have a vested interest.


As a suggestion it might help to think of this along the lines of syntax and semantics.

I see JSON as a syntax specification like INI files. (ref)

Then for a specific need you add semantics. A case in point is the INI files used with ODBC. (ref)

If you want to validate JSON semantics there is JsonSchema but I don't see that being widely used, but perhaps it should be.

When I read the above I don't get the feeling that a clear separation is present. It seems the word JSON is used when perhaps JSON instance should be used and this is discussing the schema of that instance. As this is also talking about a specific instance it needs a name to make it easier to identify.


In looking at this as a schema, it should include a version number for when things change and break.


Was expecting to see references (think JSON-LD)


For single letter for types a name should also be allowed. (think command line arguments - with single letter and -- with name).


These are just my thoughts, feel free to ignore and/or disregard.

EDIT

Gavin noted a better reference JSON Schema.

JanWielemaker commented 2 years ago

This issue has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/swi-prolog-in-the-browser-using-wasm/5650/1

rla commented 2 years ago

I worked on typing the SWI-Prolog wasm interface API in TypeScript and I found compound to be problematic:

foo(12,34) as {"foo":[12,34]}

This makes the corresponding object shape impossible to be defined statically. The interface currently adds tag $t: 't' anyway and arguments() method that makes the object a lot more usable. https://github.com/SWI-Prolog/swipl-devel/blob/7a546d6e9e3df6d15343894a71405d5ff1bd712d/src/wasm/prolog.js#L85

I would just use foo(12,34) as {"$t": "t"; "a":[12,34]} with predictable property "a" standing for the arguments. This shape is easily definable in static type systems. It makes corresponding JSON schema also much easier to write.

JanWielemaker commented 2 years ago

I tend to agree. That is why I wrapped the thing into a JavaScript class for the WASM version. The story may look different from Python?

EricGT commented 2 years ago

I worked on typing the SWI-Prolog wasm interface API in TypeScript

The thumbs up is for making progress on using types with an interface. Glad to see someone is doing work on this.