bblfsh / javascript-driver

GNU General Public License v3.0
9 stars 13 forks source link

JSON parse failure #42

Closed campoy closed 5 years ago

campoy commented 6 years ago

I tried to parse a simple JSON object, expecting it to work as it does on the Chrome console, but got a parsing error.

{
  "message": "hello, world"
}

Gives the error syntax error: message is not defined.

I would have expected JSON to be supported by this driver. Is that an incorrect assumption on my side?

dennwc commented 5 years ago

Hmm, interesting. It works if you add a variable:

a = {
  "message": "hello, world"
}

The reason for this is that the first { is interpreted as a code block with the following statement inside:

"message": "hello, world"

I'm not sure what : will mean in this context for JS, but in general this is definitely an invalid JS code.

I propose to add a driver for JSON if we want this to work properly.

dennwc commented 5 years ago

A fun coincidence: message in an error text is referring to internal protocol message file (see #17), not the "message" value from the JSON.

Anyway, it gives a parsing error, so we might need a JSON driver.

creachadair commented 5 years ago

I think that original example should already be a valid Program » Statement » ExpressionStatement » Expression » … » PrimaryExpression » ObjectLiteral, even without an assignment. (The "…" in this case elides the simple path of the precedence hierarchy).

bzz commented 5 years ago

Parsing of kv.json with latest docker run --rm -d -p 9432:9432 bblfsh/javascript-driver:v2.5.0

{
    "key": "value"
}

using

package main

import "gopkg.in/bblfsh/client-go.v3"

func main() {
    client, err := bblfsh.NewClient("localhost:9432")
    if err != nil {
        panic(err)
    }

    res, _, err := client.NewParseRequest().ReadFile("kv.json").UAST()
    if err != nil {
        panic(err)
    }
}

results in Unexpected token, expected ";" (2:9)

JSON driver 🤔 hm, as JSON ~by design should be a valid subset of JS~ this is a fallacy 😢, it could be nice use case for language_aliases but seems like it's not the case in general.

Will try to dig deeper and see if we can make current native parser overcome this.

bzz commented 5 years ago

Found that https://babeljs.io/repl gives pretty good intuition on what JS driver supports now:

repl: Unexpected token, expected ; (1:4)
> 1 | {"k":"v"}

As https://github.com/bblfsh/javascript-driver/issues/46#issuecomment-442797823 notes a particular case of difference of JSON and JavaScript that seems to be possible to mitigate with https://babeljs.io/docs/en/babel-plugin-proposal-json-strings

That's great, but does not seems to solve the case at hand.

Most probably best we can do is to document that JSON is not supported by JavaScript driver and close this one.

WDYT?

dennwc commented 5 years ago

There is a second option though. We can change the driver a bit to try parsing the code as JS first, and if it fails, try to parse it as JSON.

bzz commented 5 years ago

Yes, we can do that + #46 to advertise this capability of the driver.

Honestly, I think it's worth doing that as idea of having a separate JSON driver scares me.

I belive we are a the point of discussion where it would be nice to have an input from @campoy, @mcuadros, @creachadair .

creachadair commented 5 years ago

I generally support the idea of allowing drivers to handle more than one language. It does, however, raise some API design questions, like what to do if more than one driver claims credit for a file, and what to do if a file actually contains multiple languages (e.g., JS embedded in HTML, or HTML embedded in PHP).

dennwc commented 5 years ago

Indeed, the problem of language strings embedded in the source file of another language is real. The same happens for the code that embeds static files. Or even simpler: SQL queries.

But this seems to be out of the list of responsibilities of Babelfish server.

I think we can solve it in a more generic way by adding some "language hint" for string literals in the UAST that each driver emits and the higher-level system (or the user) may pass this literal to Babelfish again, based on that hint.

This should isolate each language in its own driver, even for cases like PHP+HTML. Although the language detection will need to be very precise.

dennwc commented 5 years ago

@bzz Actually I think we will end up with a separate driver for JSON anyway. The reason is simple: positional information.

Most languages will just parse JSON and emit the data. But only few will know where each token or literal is located.

creachadair commented 5 years ago

Indeed, the problem of language strings embedded in the source file of another language is real. The same happens for the code that embeds static files. Or even simpler: SQL queries.

But this seems to be out of the list of responsibilities of Babelfish server.

That may be true today, but I don't think we should hold to that assumption. Graceful handling of nested languages is a huge feature for people trying to do code manipulation, particularly in front-end work where embedding like this is not only common but unavoidable.

That said: I agree it is not a thing we need to cope with right now. I do think we should put it on our roadmap, though. A good theory of nested quotation could potentially be an original contribution.

dennwc commented 5 years ago

@creachadair Sorry, I meant that we have Babelfish parsing one or multiple files in a specific language and some other system above it (still written by us) that analyses this information, handles project versioning, etc. It may be the same system, but I haven't meant to exclude it from our responsibility, only from the driver's.

creachadair commented 5 years ago

Oh I see! Sorry, I was confused.

bzz commented 5 years ago

@bzz Actually I think we will end up with a separate driver for JSON anyway. The reason is simple: positional information.

After learning more about bblfsh, I agree with Denys and am going to close this issue - JSON is not supported by JS driver and should have a separate driver eventually.