bblfsh / sdk

Babelfish driver SDK
GNU General Public License v3.0
23 stars 27 forks source link

XPath returns a node instead of an attribute #340

Open dennwc opened 5 years ago

dennwc commented 5 years ago

From the new Python client:

... There is a failing test that I've left commented out, testFiltertoken, PTAL, this xpath query:

it = ctx.filter("//*[@token='else']/@token")

AFAIK it should return the @token property value (else) but it returns the full object containing it.

See https://github.com/bblfsh/client-python/pull/128#issuecomment-446199039

Related either to XPath projection in SDK, or XPath library itself.

juanjux commented 5 years ago

The problem is probably the @ in the @token (and @type, @pos, etc...) since XPath uses the @ symbol to retrieve attribute values.

juanjux commented 5 years ago

More info: the xpath library seems to be Ok and it even have a test for this case so unless we're hitting a corner case with it with our specific usage parameters (I'll do more tests to discard it) the problem seems to be the one stated above.

Also, this website (https://www.freeformatter.com/xpath-tester.html) that the Go-xpath lib seems to use as independent testing of the validity of its library (so I guess it uses another lib internally) also fails when the properties start with a "@".

We could do this workaround:

Not very pretty, but it should work and would avoid changing all the drivers, fixtures and testing code to replace the @ by something else.

dennwc commented 5 years ago

_BBLFSH_ is too hard to use, so maybe we can pick something different?

The token specifically is mapped to the node's content. Since does might have both token and children, I also projected is as a separate child node (<token> or <Token>?).

@type is projected to a node element name, so we don't need to handle it that way.

@pos is projected to attributes as: <pos-name>-start-offset, etc.

And we don't have any other @ fields defined.

juanjux commented 5 years ago

Yeah, _BBLFSH it was just an example, but the user would still use @ and it would be translated internally (would still be ugly on the fixtures). The length is to avoid possible collisions with drivers' existing properties. But since we translate whatever field is the token in the native AST, we should be right using another "token" property for this specific case as you proposed.

creachadair commented 5 years ago

Is this an issue that arises from queries generated inside the library (rather than directly by users), and if so can we work around it by adding quotes explicitly? I'm pretty sure XPath allows "@" in values, as long as they're escaped.

juanjux commented 5 years ago

Nope, no scaping of symbols in property names for XPath 1.0 as far as I investigated. Double quotes doesn't work either. This would also affect users, actually I found about this while upgrading the Python client's unittests.

The solution proposed by @dennwc of creating internally a 'token' property with the @token should be simple to implement.

dennwc commented 5 years ago

P.S. I believe it was already implemented and merged ;) We just haven't updated the docs.