Consider adding an option to use a different JSON Format

EricZinda commented 3 years ago

Note that the system is using the builtin term_to_json/2 since that is what is used by the http package already.

The JSON mapping of Prolog terms is more cumbersome than it probably needs to be. (It also needs to be documented, since I’m only getting this by looking at the protocol dumps you’ve provided.) My suggestion would be something along these lines:

Prolog type | Prolog representation | JSON representation | Notes -- | -- | -- | -- integer | 1234 | 1234 | 1 float | 1234.0 | 1234.0 | 1 atom | '1234' | "1234" | 2 variable | _1234 | {"_":"_1234"} | string | "1234" | {"\"":"1234"} | list | `1234` | [49,50,51,52] | compound | foo(12,34) | {"foo":[12,34]} | 3 dict | foo{12:34} | {"{":["foo",{12:34}]} | improper list | [1,2,3\|4] | {"[":[[1,2,3],4]} | unrepresentable | '['([1,2,3],4) | {"'":["compound","[",[1,2,3],4]} | 4

[1] If it fits in a pointer-size signed integer (or in a double). Otherwise translate to an “unrepresentable” functor, because many JSON libraries (including, of course, Javascript itself) have limitations about the size of numbers they’ll read.
[2] Between atoms and SWI strings, atoms get a lot more use, so they get to use the bare JSON string representation.
[3] Any compound whose functor can be represented as an unquoted atom will be represented this way. Otherwise wrap with T0 =.. L, T1 =.. ['\'',compound|L].
[4] This can be used for any unrepresentable type, or any value outside the representable domain of one of the other above types.

This keeps the representation compact (not actually a small consideration, if you’re transferring large data sets), makes it more readable (functor names before arguments), and gives every value (modulo dicts, whose keys can be reordered, and floats which are weird) a single canonical, invariant representation - which means I don’t even need a JSON library, if my results are predictable. Thus:

true([[threads(language_server1_conn2_comm, language_server1_conn2_goal)]])
% Original:
{"args": [[[{"args": ["language_server1_conn2_comm", "language_server1_conn2_goal"], "functor":"threads"}]]], "functor":"true"}
% New:
{"true": [[[{"threads": ["language_server1_conn2_comm", "language_server1_conn2_goal"]}]]]}

EricZinda commented 2 years ago

Coming back to this and summarizing: The proposed JSON format is:

Fundamental types are described above
Everything else is a dict with one key and a list for the value
- If the key is "special" (described below) it represents a type that requires special handling and then the arguments are handled as per that type
- Otherwise it represents a compound where where the term name is the sole key and the arguments are the arguments

Special keys: ": string _: variable {: dict [: bad list ': unrepresentable

Some things I'd like to fix if possible:

the fact that the keys are symbols feels weird
handling strings will take some extra code. I'd rather have them be transparent values like atoms and have a way to see if they are actually strings if you care
Do we really need bad list and unrepresentable?
Should allow embedded json/1 terms and pass them through as is so caller can create specialized json when required?

JanWielemaker commented 2 years ago

Designing something like this is IMO the way to go. JSON seems great it exchanging data that satisfies a nice and well defined object schema. Trying to represent another dynamic data representation isn't great as one typically end up with a {"type": "...", ...} which is indeed a little verbose. As is though, it gets hard to identify for example a compound. That is an object with a single key that is not a reserved key (I think). Also, some stuff gets ambiguous, which is why we get an unrepresentable category. What about this.

integer <-> integer
float <-> float
atom <-> string
list <-> list
dict with tag json and atom keys <-> object
Anything else is {"t":Type, ...} Types:
- string -> "s", "v": String
- big int -> "I", "v": DecimalString
- rational -> "r", "n": Integer, "d": Integer
- rational -> "R", "n": DecimalString, "d": DecimalString (if components are too big)
- singleton var -> "v"
- Normal var -> "V", "v": Name
- compound -> "t", "f":Name, "a":Args (a list)
- partial list -> "l", "l": list, "t": tail
- dict -> "d", "t":Tag, "kv": List of {"k":Key, "v":Value}
- blob -> "b", "v": String representation

It only uses JSON objects where needed. There are two corner cases that we can handle special: cyclic terms can be e.g., {"t":"ct", "v":Skeleton, "u": Unifications} and terms with attributed variables. We can use copy term and represent hthis as {"t":"av", "v":Term, "c":ListOfConstraints}

P.s. I consider json(...) old school.

This can (if I didn't make a mistake) represent any Prolog term and is almost non-ambiguous. It allows for representing normal application data quite nicely (numbers, strings and json{...} dicts). Only the stuff that doesn't fit JSON naturally gets an object. Some problems remain:

Prolog -> JSON: How to send true/false/null?
JSON -> Prolog: objects holding "t" are special (ambiguous). I have seen JSON representations using "_t" or "$t" to reduct such ambiguities. Also here we have the question on how to represent true/false/null.

The true/false/null issue can be avoided mapping Prolog strings to JSON strings, the reserved atoms to themselves and other atoms to {"t":"a", "v":String}. This comes with its own problems because SWI-Prolog introduced strings when atoms already got commonly used for what strings where intended to do. ECLiPSe introduced them early and that community has less of a problem.

EricZinda commented 2 years ago

I haven't exhaustively reviewed your list of cases, but I agree with the ones you wrote down and that we should go with that general approach.

A couple of questions here @JanWielemaker:

Are you imagining that this is implemented as a new predicate, let's call it term_to_json2/2, that takes a Prolog term and converts it to the json/1 format and then the caller users json_write or json_write_dict with that output? Or is this a new version of json_write that takes any Prolog term, something like json_write_term(+Stream, +Term, +Options)?
I really like mapping atoms to be to JSON strings (and Prolog strings to a JSON object). Seems like the more common case.
I wonder if there should be an option that represents both atoms and strings as JSON strings. Yes, this won't "round trip", but for the way MQI is designed right now (Prolog gets set to MQI, JSON gets returned), I suspect that most people know where the atoms and strings are in their interface and making one be a JSON object just adds more work for the user...
What did you mean that you consider json(...) old school? What is the new school?

Format Thoughts

[Side Note: Pengines needs support for outputting arbitrary JSON since it is designed to support integration into existing/arbitrary systems and needs that flexibility. I'd argue that MQI can make due with one, good canonical representation since it is designed to be an implementation detail of an existing application (in that sense it is like an Application Binary Interface (ABI)) I.e. we don't need to worry too much about the user being able to control the format (although pengines does!) just having a single good one. That said, I could imagine, as we've discussed, having support in MQI for different wire formats over time but for different reasons than pengines: things like performance. Maybe it amounts to the same thing but coming at it from different perspectives.]

It seems like we have some choices for the JSON object format, and I think any could work but some are more readable. I'll illustrate for the compound case of foo(arg1, arg2) and a string "a string":

Updated Original Proposal: Agree we need a way to quickly identify a compound. I think we could say that all the "reserved keys" must start with "$" to make that easy. To represent a compound that just happens to be a reserved key, the caller could surround with quotes (i.e. use "'$s'"). We could use your updated list of keys as the "reserved keys": compound: {"foo": ["arg1", "arg2"]} compound with name == reserved key "$s": {"'$s'": ["arg1", "arg2"]} string: {"$s": "a string"} Python check first argument: if myjson["foo"][0] == "arg1": Python check for compound: if first(iter(myjson))[0] != "$":
Your proposal above: compound: {"t": "t", "f": "foo", "a":["arg1", "arg2"]} string: {"t":"s", "v":"a string"} Python check first argument: if myjson["f"] == "foo" and myjson["a"][0] == "arg1": Python check for compound: if myjson["t"] == "t":
(just an idea, maybe a little more readable?) Modified version of your proposal that uses position in a list instead of introducing keys: compound: {"t": ["foo", ["arg1", "arg2"]]} string: {"s":"a string"} Python check first argument: if myjson["t"][0] == "foo" and myjson["t"][1][0] == "arg1":
(another idea) Modified version of your proposal that just uses a list (no keys at all) and position in a list instead of any keys: compound: ["t", "foo", ["arg1", "arg2"]] string: ["s", "a string"] Python check first argument: if myjson[0] == "t" and mjson[1] == "foo" and myjson[2][0] == "arg1":

My vote is for #1:

To me, the JSON in 1 is much easier to read, especially in the common case where you don't have to use JSON objects to represent anything other than the compound. It is also more compact. In Python (and Javascript and I would bet most other languages), this form is very natural and easy to read if you are doing the common case of traversing a known term structure.
I think 2 is more natural if you are examining an unknown term structure, but I find it harder to read the JSON and more verbose for known term structures. It is a bit more obvious to read the json when accessing elements of json objects since it uses keys instead of position in a list (i.e. myjson["v"] vs. myjson["$s"][0] for retrieving a string value)

true, false, null

Prolog to JSON: Since we are treating the JSON as an "ABI" format, can't we just always represent the atoms true, false and null in whatever JSON atom representation we end up with and not use the JSON true/false/null at all (unless the user uses the json/1 predicate to represent it)?
JSON to Prolog: We could assume that this is an "ABI" format and thus we should never see these (since they will always be atoms), but if we happen to, they could be wrapped in json(@(true)), etc?

EricZinda commented 2 years ago

Some notes after a discussion about the above topics:

Format

We agreed that the best approach would be a merger of option 1 and 2 above, preserving the best of both:

The full, round-trippable JSON format would (perhaps optionally) include, in each JSON dict, a type key named "$t" that indicates the type of the Prolog term. The names of and number of the rest of the keys would depend on the type. Including the "$t" argument allows for round tripping. For example:

[Note that whether atoms or strings get to be the default JSON string can be switched as an option]

atom <-> string "a string": {"$t": "s", "v": "a string"} integer <-> integer float <-> float list <-> list foo(arg1, arg2): {"$t": "t", "foo": ["arg1", "arg2"]}

This approach allows the non-Prolog client to use a really nice interface to access terms if their structure is known ahead of time (which is often the case) like this, for example, in Python:

# term = {"$t": "t", "foo": ["arg1", "arg2"]}
print(term["foo"][0])

arg1

As mentioned above, this does mean that the term_to_json/json_to_term is defining a particular canonical JSON "schema" that must be conformed to. As discussed in previous posts above, this is OK since these predicates are intended to be used as an interface or an ABI, not as a general purpose generator of arbitrary JSON documents. There are other predicates that allow building arbitrary JSON in SWI Prolog.

Other Notes

Forget json/1, use dicts as the "Prolog intermediate format for JSON"
Do this in two phases:

Phase 1:

Start by leveraging the existing json_write_dict/json_read_dict predicates, as is. Then, just create a new pair of term_to_json/json_to_term predicates that read and write the dict based intermediate Prolog intermediate format
Have options that:
- Switch whether strings or atoms get to use the "nice" serialization option (i.e. just written as JSON strings)
- (consider) Include the type argument or not in the JSON representation. Not including it means no round-tripping, but for applications that are just consuming the JSON that might be OK and provide a nicer JSON format.

Phase 2:

Create a term that goes straight from an arbitrary Prolog term to a string (and back) without using the intermediate format to save memory (by not having to build the entire Prolog intermediate representation first).
Create alternatives (in a new library so they can just be used by switching the libraries) of json_dict_write/read that output the new format

JanWielemaker commented 2 years ago

This issue has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/wiki-discussion-swi-prolog-in-the-browser-using-wasm/5651/75

EricGT commented 2 years ago

The context in which this is written. Came here because of (ref)

I have been discussing a new JSON format with @ericzinda at Consider adding an option to use a different JSON Format · Issue #4 · SWI-Prolog/packages-mqi · GitHub 1

so my mind set is that this will/could become a/the new JSON package for SWI-Prolog and I often use JSON so have a vested interest.

As a suggestion it might help to think of this along the lines of syntax and semantics.

I see JSON as a syntax specification like INI files. (ref)

Then for a specific need you add semantics. A case in point is the INI files used with ODBC. (ref)

If you want to validate JSON semantics there is JsonSchema but I don't see that being widely used, but perhaps it should be.

When I read the above I don't get the feeling that a clear separation is present. It seems the word JSON is used when perhaps JSON instance should be used and this is discussing the schema of that instance. As this is also talking about a specific instance it needs a name to make it easier to identify.

In looking at this as a schema, it should include a version number for when things change and break.

Was expecting to see references (think JSON-LD)

For single letter for types a name should also be allowed. (think command line arguments - with single letter and -- with name).

These are just my thoughts, feel free to ignore and/or disregard.

EDIT

Gavin noted a better reference JSON Schema.

JanWielemaker commented 2 years ago

This issue has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/swi-prolog-in-the-browser-using-wasm/5650/1

rla commented 2 years ago

I worked on typing the SWI-Prolog wasm interface API in TypeScript and I found compound to be problematic:

foo(12,34) as {"foo":[12,34]}

This makes the corresponding object shape impossible to be defined statically. The interface currently adds tag $t: 't' anyway and arguments() method that makes the object a lot more usable. https://github.com/SWI-Prolog/swipl-devel/blob/7a546d6e9e3df6d15343894a71405d5ff1bd712d/src/wasm/prolog.js#L85

I would just use foo(12,34) as {"$t": "t"; "a":[12,34]} with predictable property "a" standing for the arguments. This shape is easily definable in static type systems. It makes corresponding JSON schema also much easier to write.

JanWielemaker commented 2 years ago

I tend to agree. That is why I wrapped the thing into a JavaScript class for the WASM version. The story may look different from Python?

EricGT commented 2 years ago

I worked on typing the SWI-Prolog wasm interface API in TypeScript

The thumbs up is for making progress on using types with an interface. Glad to see someone is doing work on this.

SWI-Prolog / packages-mqi