Closed agrawroh closed 1 year ago
@szeiger could you review this? since you guys own sjsonnet these days :)
Can we at least skip the intermediate JSON serialization and translate the SnakeYAML AST directly? If we add YAML parsing and use it in our codebase, I expect performance to be an issue. Ideally, we'd parse directly to an Sjsonnet AST (like in ValVisitor
for parsing JSON) but I don't know if SnakeYAML supports that. I tried to look it up but the project appears to be inaccessible in a private BitBucket repository, which is a bit worrisome for a new dependency. Do you know more about the status of SnakeYAML?
Can we at least skip the intermediate JSON serialization and translate the SnakeYAML AST directly? If we add YAML parsing and use it in our codebase, I expect performance to be an issue. Ideally, we'd parse directly to an Sjsonnet AST (like in
ValVisitor
for parsing JSON) but I don't know if SnakeYAML supports that. I tried to look it up but the project appears to be inaccessible in a private BitBucket repository, which is a bit worrisome for a new dependency. Do you know more about the status of SnakeYAML?
@szeiger I don't know much about SnakeYAML except that it appears in almost all the threads I could find for parsing YAML in scala. It's also used at some other places in our universe. I could also find this other library which converts SnakeYAML AST to Circle AST (JSON) [Ref] My use-case was to parse and validate the GOC file(s) which are YAMLs inside my service's JSONNET template. I am not sure if we'd be using the YAML parsing at a lot of other places but the performance could be a concern.
JSONNET seems to be using this CPP library called RapidYAML for implementing the parseYaml() function.
Any suggestions on how we can implement this?
@agrawroh I also found most recommendations pointing to SnakeYAML. I was surprised to see that it doesn't have a public repo or docs. Maybe this is a recent change?
Transforming the AST for at least decent performance is probably easy. It looks like SnakeYAML already uses normal Java maps for objects. The primitives are probably simple wrappers, similar to a JSON AST. This shouldn't take more than a few lines of code for a huge performance improvement.
The ability to easily skip the SnakeYAML AST entirely depends on the library's design (which is why I was looking for docs). uPickle makes this very easy (which allows us to implement a JSON to Sjsonnet AST parser with no intermediate JSON AST in 46 LOC). It may be harder to do and the expected pay-off is much smaller.
@szeiger BTW Did you try using this link to access SnakeYAML repository? https://bitbucket.org/snakeyaml/snakeyaml/src/master/
Thanks, that link works. Every page I found so far referred to https://bitbucket.org/asomov/snakeyaml
Stamping to get this unblocked. My concerns about the performance still stand. We should at least skip the intermediate JSON serialization and deserialization. If this ends up being used with large YAML files, we're likely to run into performance issues with the current design.
Will try to get this in to unblock Cloud Team. I will have a few free cycles next month. Would definitely try to work on the performance optimizations for this as well as compression which started taking a lot of time.
Description
This PR adds a new
parseYaml()
function to sjsonnet.JSONNET Reference Doc
Signed-off-by: Rohit Agrawal rohit.agrawal@databricks.com