Open kuba-- opened 5 years ago
Yes, the library isn't ideal. We should fix it upstream.
But also there were few discussions that XPath might not be the best language going forward, so maybe we will just use a new language? Will need to decide at some point.
Ok. I'm wondering if graphql would be a good choice here.
Real GraphQL needs a srits schema and we don't have one by design.
But I've implemented a GraphQL endpoint with dynamic schema for Cayley, so in practice it can be done.
The main problem is that GraphQL still cannot change the shape of a response. If you are querying multiple nested nodes, but only want few value extracted from them, you will end up requesting a skeleton of that subtree with few values in there. We need to check if there are any alternatives that can allow querying exactly what the user needs.
Keep in mind that we do support, recommend and demo XPath today. Feel free to open a separate issue and start discussing proposals for the future, but this is a real issue today.
Since the errors on the test code always say the same, so it's not clear to me what of the examples are supossed to be right or wrong, but I've tested with an online XPath validator. I've also tested using the client-python v3 (which uses the SDK and thus the library).
@kuba-- @ajnavarro if you guys have any thoughts on results of @juanjux preliminary research - please feel free to share
For me it's all about optimization. If we can reject wrong query earlier instead of waiting till the end to say "no results" (because query is no valid) then it's better. In other words we'll take what LA team give us ;) The open question is - do we wanna change 3rd library to parse xpath, or stick to what we use so far? @ajnavarro - correct me if I'm wrong.
@kuba-- totally agree.
I think the problem here is that almost any string is a valid XPath query. Maybe we should add some extra validation for our use case, like the supported node names. Per example, if the user tries to query //uast:Idenitfier
instead of //uast:Identifier
, in my opinion, we should throw an error before executing the query.
We may introduce waterfall of validators. XPath library should validate just xpath. So, bblfsh will be sure that got valid xpath and can focus on validating own syntax and reject e.g. non-uast queries:
//ast:Paradigm
This will require us to have a strict schema exposed to XPath. And before even trying this we will need to refactor XPath library to allow such hooks to be installed.
No worries, I can help ;)
But frankly, this is just a wish. Actually we already have a kind of schema, but not written down, to let us generate validators. It's like having yacc file for our xpath.
I believe, any syntax tree cannot be universal if it doesn't have universal validators. It's like a programming language where in theory you can use any syntax but sometimes it will return something and sometimes not. It should't work like this. If you use fn
instead of func
it should tell you that this is wrong, even if both cases it's a valid literal.
In general, I think it's a good idea, but we are not ready for it right now.
For example "our" XPath is well defined for Semantic nodes (uast:*
), but we know nearly nothing about nodes unique to each programming language (cpp:*
). I have plans to derive this schema later, but for now, it's only possible to assert for uast:
-related queries.
So let's do it in "baby steps". Personally, I prefer many small features/tools which we can add one by one instead of one big 🎉 effect.
Totally agree! We can start by making an XPath library better. It's a bit messy right now.
Talking about "baby steps", how about changing the current xpath library (github.com/antchfx/xpath
) to the gopkg.in/xmlpath.v2
?
At least in following test it can find the last invalid xpath:
package main
import (
. "gopkg.in/xmlpath.v2"
// . "github.com/antchfx/xpath"
)
func main() {
MustCompile("//*[@role=\"Return\"]")
MustCompile("BAD")
MustCompile("this is a totally crap. For sure not valid @xpath") // <----
}
I checked the library briefly, it looks better but has one downside: it cannot work with virtual nodes, only with full XML tree.
Still, it may be an option to fork the library and make it work with some node interface instead of a specific type.
The issue is related to:
Generally SDK uses 3rd-party package (github.com/antchfx/xpath) to compile xpath queries. Unfortunately lot of incorrect xpaths passes, e.g.: