New: JSON API output - Githubissues

sebastien commented 5 years ago

We've been discussing adding a JSON API to help integrating curv with an external interactive editing environment.

We agreed that a simple JSON-based, standard I/O based communication protocol would be ideal to get started. This new output format would be enabled using the -o json-api command line option.

In this protocol, each message is a JSON-encoded object terminated by \n. Each message has the following structure:

{
    "type": <"print"|"warning"|"error"|"shape"|"value">,
    "value": <...>
}

Note: Doug original proposed to have {"print":...}, {"warning":...} but the {type,value} format makes it slightly easier to dispatch messages (it's a map of the type attribute) and leaves room for extension without conflict.

Print & Warning messages

{
    "type": "print",
    "value": "<the string to be printed>"
}

Error message

{
    "type": "error",
    "value": {
         "path": <path to the curv source file>,
         "offset": <offset in bytes -- not UTF8 chars of the error>,
         "line": <line number, starting at 0>,
         "column": <column number, starting at 0>,
         "error": <string for the error name>,
         "description": <error description>,
         "context":[
             <an array of the last N lines up to the line that has an error>
          ]
    }
}

Value message

{
    "type": "value",
    "value": "<the JSON-encoded value>"
}

Shape message

{
    "type": "shape",
    "value": {
         "is2d":<true|false>,
         "is3d":<true|false>,
         "bounds": [x0,y0,z0, x1,y1,z1],
         "glsl" : {
             "platform"  : <platform requirements, to be defined>,
             "fragment": <string representation of the GLSL fragment shader>
         }
    }
}

doug-moen commented 5 years ago

This is now partially implemented in the master branch. For now, I am still using {"typename": value}, and not the {"type":"typename","value":value} scheme that Sebastien prefers.

The top level objects are:

{"print" : string}
{"warning" : exception}
{"value" : value}
{"shape" : shape}
{"error" : exception}

The json-api output consists of zero or more print or warning objects (representing debug output that is supposed to be printed on the console), followed by a final object which is one of value, shape or error.

An exception is { "message":string, "location":location_array }, where "location" is optional.

A location_array is an array of one or more location.

A location is {"byte_range":[int,int], "filename":string}. The byte_range is a zero-indexed, half-open range of byte indexes into the source file. The filename is optional. TODO: add start and end line and column numbers.

TODO: Sebastien described "context": <an array of the last N lines up to the line that has an error> which would belong in the location object. The description is ambiguous.

A shape is {"shader":string, "is_2d":bool, "is_3d":bool, "bbox": [[xmin,ymin,zmin],[xmax,ymax,zmax]] }. TODO: Describe picker GUI controls and uniform variables. TODO: Currently, the shader string is a shadertoy compatible shader, which means there is some boilerplate that needs to be added to the beginning and end to make it valid GLSL. Maybe the full shader should be provided.

sebastien commented 5 years ago

I just tried it and I think you'll need to encode the \n as \\n in the shape.shader field. It's important that the JSON payloads be \n-separated as it will allow for streaming of messages to the client -- and the \n should never be found in the JSON payload itself.

I think we can drop the error.context attribute, the location is going to be enough to retrieve the context from the source. Also, I think that actually the character range would be better than the byte_range, I got my wires crossed!. If you only have the byte_range it's OK as well, but will take an extra encoding/decoding step to retrieve the substrings.

It would be nice to have a standalone GLSL shader, but the boilterplate code required to run it is not problematic, so it's good like that for now.

Not sure if it's the right place to discuss this, but I think I'd like to have the parsed AST exported as JSON or S-Expr. It's going to be useful to extract symbols from libraries and code in the short term, and in the longterm I want to edit the AST directly online and stream tree patches to the compiler.

doug-moen commented 5 years ago

I will fix the newline problem.

By "character range", do you mean "a pair of Unicode code point counts", or do you mean "a pair of <line-number,column> indexes"? I can provide either one, or both. Note that Curv source files are currently restricted to ASCII, so every code point is a byte right now.

There are some big design issues around the AST requirements, which we need to discuss outside the context of JSON-API.

sebastien commented 5 years ago

By character range I mean the index of the glyph in a unicode string, basically so that I can do (in JavaScript) source.substring(range[0], range[1]), but now that you mention it, I think it would be good to have the line number as well:

{
  start:{char:<glyph index>, line:<glyph line number>, column:<glyph column number>},
  end:{...}
}

This should provide all the info to highlight the error in the UI and would save implementing a code source search for the fragment.

I'll open a ticket for the AST, or do you prefer to discuss it by email or on the forum?

doug-moen commented 5 years ago

In Python, a character index is a Unicode Code Point index. In Javascript, a character index is a UTF-16 Code Unit index. The index values are different if the string contains emojis. Swift uses yet another definition, Extended Grapheme Cluster index, where the index values depend on tables which change with each new Unicode release, so client and server have to agree on a Unicode version. This is just one of about 1000 reasons why I don't support Unicode right now. But yes, I will use your design, except I will use 'byte' instead of 'char', since character indexes are different in every programming language. And byte indexes will work as character indexes since Curv strings are restricted to ASCII.

I don't want to spam the forum too much with this discussion, so let's use a github issue.

doug-moen commented 5 years ago

I redesigned the location object. Here is the output of curv -ojson-api -x '2+true', after being piped through jq to format it:

{
  "error": {
    "message": "2 + true: domain error",
    "location": [
      {
        "start": {
          "char": 0,
          "line_begin": 0,
          "line": 0,
          "column": 0
        },
        "end": {
          "char": 6,
          "line": 0,
          "column": 6
        }
      }
    ]
  }
}

start.line_begin is the byte/character offset of the first character of the first line.
line and column numbers use zero-based indexing, hence "line 0" in the above example.
start.char and end.char are a half-open interval: end.char is the index of the first character after the text span that we are indexing. Therefore, if the text span has zero length, then start.char==end.char.
Likewise, end.column is the index of the first character past the end of the text span.

sebastien commented 5 years ago

No problem for ASCII/byte only, but then I think we should change "char" to "byte" to avoid confusion (I had a similar issue when my parser was giving byte-offset in a unicode string while the JS front-end was interpreting glyph/char offset).

Also, do you plan to use {type,value}, or {<type>:value} in the long term? I'm still working on the GUI infrastructure, but I'm going to start wiring the JSON API interop fairly soon and I'd like to avoid re-doing some code. I think the first format is more resilient, but I don't mind the other either (I might just say "I told you so" later on ;).

doug-moen commented 5 years ago

I'd like to continue using {"type":value}. Here's my reasoning.

In the future, we'll need a bidirectional communication protocol for sending messages between the kernel and the GUI, which could be based on JSON-RPC or ZeroMQ. Messages will have an envelope, with metadata like a session id and request id, plus a payload, which is data described by a tagged union: there is a tag (message type or method) plus type-specific data.

The JSON-API output format is a transitional technology that doesn't need to be complicated. All we need is a stream of payload objects, we don't need envelope metadata.

For JSON-API, I'd like to use the simplest design that works. I think that {"tag": data} is an elegant way to represent tagged data in JSON. The same representation could be embedded in a more general message envelope structure. I like my proposed scheme because:

It is simple, compact and easy to read.
Object pathnames are nicer, eg: .error.message and .shape.shader.
The tag-specific data is under the tag name in the data hierarchy, which makes logical sense.
If you are using a functional programming language that can perform pattern matching on JSON data, such as jq, elixir, or curv, then the patterns used to select data from json-api output are shorter and more convenient.

If JSON-API survives into the long term, we can evolve it, adding record fields, adding new types and deprecating old ones, if the need arises.

doug-moen commented 5 years ago

I decided to use your suggestion of "char" rather than my original idea of "byte" because the fields "char", "line_begin" and "column" are all measured using the same units, and they are all character counts.

doug-moen commented 5 years ago

"I'm still working on the GUI infrastructure, but I'm going to start wiring the JSON API interop fairly soon"

That sounds great! Looking forward to seeing what your design looks like.

sebastien commented 5 years ago

Thanks for the explanation on the {<type>:<value>} format, I appreciate it, and agree with the rationale. For context, I was thinking along the same lines (preparing for encapsulating the messages) and wanted to separate meta-data (top-level) from data (value level) so that we could extend the message format to have an envelope integrated directly in the meta data and avoid namespace clash with he data.

Now just one thing I'd like to clarify, because I'm a bit confused: does char indicate a byte offset or a character (not as C char but as JavaScript or Python char offset -- ie. unicode glyph offset)? If it's the former I really think we should use byte instead and indicate that column is also in byte in the documentation (although I think it should be in char once you support UTF8). Alternatively, this could be {offset,column,line,unit} (and maybe line_offset instead of line_begin) where unit=('bytes'|'char') -- but that's a bit more verbose.

In any case, the end goal is to avoid confusion and be mindful that in JavaScript (the primary consumer of the data) string indexation is based on the unicode glyphs, not the offset within the byte representation.

doug-moen commented 5 years ago

The JSON-API is primarily intended to be used by high level languages like Python and Javascript, which do not support byte indexing into strings. The {char,line_begin,column} fields need to be character indexes that work in high level languages, not byte indexes.

So that's what I've done. If I were to use the word 'byte', then it would convey the false impression that these integers cannot be used as Javascript string indexes.

Now, it happens that Curv source files are restricted to ASCII, which means that byte indexes and character indexes are the same thing. The Unicode rant that I inserted into the previous message seems to have muddied the waters. I was reminding myself that extending Curv to support non-ASCII Unicode characters is tricky, because it could potentially break an interface like JSON-API that contains character offsets.

sebastien commented 5 years ago

Perfect, it's much easier for me to work with chars, and it will definitely speed up the frontend-backend interaction.

I noticed that the JSON output includes Infinity, which is not valid JSON (it works fine from Python, but not from Firefox or Chrome). Here's how to test: JSON.parse("Infinity")

doug-moen commented 5 years ago

Curv uses infinity a lot, but in json-api, infinities are supposed to be printed as 1e9999. If you have an example where infinity is printed as "inf" instead of as "1e9999", please post the full json-api output because I can't reproduce the bug.

doug-moen commented 5 years ago

I know you said "Infinity" but there is no code in curv that can print inf as "Infinity" as far as I know.

doug-moen commented 5 years ago

Oh, I see that Infinity is the name of a global variable in Javascript.

If I type 1e9999 into a Javascript REPL, then it prints Infinity in response.

Maybe if you read the JSON returned by curv, evaluate it to a Javascript value, then convert that to JSON text, then convert that text to a Javascript value a second time, then you will encounter this problem.

sebastien commented 5 years ago

Oh, wait, it's actually Python -- I decode the JSON before re-encoding it on the webservice, which wrongly expands numbers to Infinity, so more like a Python-related bug.

I've noticed some issues with the GLSL output. For instance with this one, Firefox gives me:

*** Error compiling shader: WARNING: 0:14: '/' : Divide by zero during constant folding
WARNING: 0:119: '/' : Divide by zero during constant folding
ERROR: 0:41: 'r26' : Loop index cannot be initialized with non-constant expression
ERROR: 0:78: 'r56' : Loop index cannot be initialized with non-constant expression

Is there a way for the compiler to predict which shape will compile to a working WebGL fragment shader? It would be neat to have a warning or an error from the compiler directly that explains why the shader does not work in WebGL (you mentioned something about loops in the group discussion).

doug-moen commented 5 years ago

I've noticed some issues with the GLSL output.

This may be a difference between desktop OpenGL and WebGL. The WebGL version of GLSL is more restricted. Or it may be that you are using WebGL 1, and the problems will go away if you switch to WebGL 2.

Loop index cannot be initialized with non-constant expression

This is a WebGL 1 restriction which is supposed to be lifted in WebGL 2. First thing to check is that you are creating a WebGL 2 context, not a WebGL 1 context, when you initialize OpenGL.

Divide by zero during constant folding

There is no Infinity constant in GLSL, so I simulate it by computing 1.0/0.0. This code works on the desktop using an OpenGL 3.2 core context. I would have expected it to work in WebGL 2 as well, but we need to run an experiment to verify that.

Is there a way for the compiler to predict which shape will compile to a working WebGL fragment shader? It would be neat to have a warning or an error from the compiler directly that explains why the shader does not work in WebGL.

if we are going to support both WebGL 1 and WebGL 2, then work is required in both the GL Compiler, and in your code. You would need to test the WebGL environment and determine if WebGL 1 or WebGL 2 is supported. You pass a command line flag to curv indicating the level of GL support. Then the Curv GL Compiler enforces compile time restrictions based on the GL support level.

If we are just supporting WebGL 2, then I suspect that the current GL Compiler output works in WebGL 2, and no extra work is required. I am currently targeting OpenGL 3.2 Core, released 2009. WebGL 2, released Jan 2017, is based on OpenGL ES 3.0, which in turn was released Aug 2012. OpenGL 4.3, released 2012, is the first version of desktop OpenGL that provides all of the features of OpenGL ES 3.0 (and is also a superset of ES 3.0). Based on these dates, I think OpenGL 3.2 GLSL code should work fine in WebGL 2. But OpenGL implementations tend to be buggy, so testing is always required.

doug-moen commented 5 years ago

https://stackoverflow.com/questions/51428435/how-to-determine-webgl-and-glsl-version

WebGL1 supports GLSL ES 1.0. WebGL2 supports both GLSL ES 1.0 and GLSL ES 3.0 period. The first line in a GLSL ES 3.0 shader must be #version 300 es

So the first line of the shader must be #version 300 es to avoid this problem.

doug-moen commented 5 years ago

And you must create the context using const gl = someCanvas.getContext("webgl2");

sebastien commented 5 years ago

Thanks for the info -- I seems that the fragment shaders are a bit different between both version, I get that type of error:

238: void mainImage( out vec4 fragColour, in vec2 fragCoord ) 
...
287: } 
288:
289: void main() {mainImage(gl_FragColor, gl_FragCoord.xy);} 
290: *** Error compiling shader: ERROR: 
0:289: 'gl_FragColor' : undeclared identifier ERROR: 
0:289: 'mainImage' : no matching overloaded function found

I'll let you know once I've learned about the differences between WebGL1 and WebGL2. I have to work on camera interaction first before tackling that one!

doug-moen commented 5 years ago

I didn't give you the boilerplate for main(), and I haven't updated curv -ojson-api to output the boilerplate. The epilog code that I insert at the end is:

            void main(void) {
                mainImage(oFragColour, gl_FragCoord.st);
            }

Which is specific to WebGL 2. The WebGL 1 code would use gl_FragColor as you have written.

My prolog code looks something like this:

#version 150
#define GLSLVIEWER 1
uniform vec2 iResolution;
out vec4 oFragColour;
uniform float iTime;

The main thing is that you need to define an out variable, which I call oFragColour, and you need to reference that same variable in main(). Or at least that's what works in OpenGL 3.2.

sebastien commented 5 years ago

Thanks a lot, I also had to change the vertex shader a bit (include the version and change attribute position to in position), but it worked!

So the next step after that would be to get the parameters output in the json-api. I still have a good week and a half of work to do to fix the remaining issues until I move on to the parameterization UI.

dumblob commented 5 years ago

I've tried Curv few months ago, but didn't work due to my old Intel graphics card. Today I came across https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md and thought it might help with such issues as well as issues mentioned in this discussion.

doug-moen commented 5 years ago

@dumblob When you say "old Intel graphics card", what model of graphics card is it? If you still have the curv executable, what does 'curv --version' print? I'm wondering if your GPU really is too old to support Curv, or if there is some other problem that is fixable.

Thanks for the link to sokol-tools. I'm looking at several GPU middleware layers to fix my GPU problems. Right now I'm most excited about Google's Dawn library. https://dawn.googlesource.com/dawn

dumblob commented 5 years ago

When you say "old Intel graphics card", what model of graphics card is it?

I've compiled curv right now just to test it again and this is the output:

curv> cube
3D shape 2×2×2
curv> GLFW error 0x10007: GLX: Failed to create context: GLXBadFBConfig
ABORT: GLFW create window failed
255$ curv --version
Curv: 0.4-260-g4c2f3826
Compiler: gcc 9.1.0
Kernel: Linux 4.19.67-1-lts x86_64
GPU: Intel Open Source Technology Center, Mesa DRI Mobile Intel® GM45 Express Chipset 
OpenGL: 2.1 Mesa 19.1.5

(but I would be glad if it worked, of course :wink:)

doug-moen commented 5 years ago

@dumblob that's a legitimate error. It confirms that your Intel gm45 is from 2009 is too old to run Curv.

doug-moen commented 5 years ago

The JSON API feature is stable, so I'm closing the issue.

curv3d / curv

New: JSON API output #55

Print & Warning messages

Error message

Value message

Shape message