Open ValarDragon opened 1 year ago
There's one requirement: codegen needs to be protojson compliant. So I don't think just using EasyJSON is going to settle it, we would most likely need to fork and adjust it to handle certain proto types Any, Timestamp, Duration
in a certain way.
We also really want the encoding to be deterministic
Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json
Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems)
Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json
As far as I can tell that library doesn't do anything for performance. It's just a thin wrapper for encoding/json
Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems)
I'm hopeful that streaming genesis will improve the memory issues although performance would likely be similar. What if we allowed streaming binary protobuf genesis as an option?
Your right that library doesn't help :(
I think keeping the genesis JSON native, and just figuring out how to codegen the proto JSON serialization is worth doing.
I think it should be pretty low overhead to make a second tool to do JSONcodegen with:
As long as we can add it as another second step, rather than editing the existing code. (queue the meme "# compiler passes = # of teams working on project")
I guess JSON is good for:
Protobuf is fine for the case thats more performance sensitive, which is taking a chain export, editing some value, and starting a testnet. Its already really painful to do this with the JSON tho, I think having to make new protobuf tools will make this unusable short term without serious investment in improving that.
I think just getting the JSON codegen'd will be a bigger win
I think JSON codegen will be a huge effort and this does seem like an optimization for an edge use case (when there's a big chain and export is needed for testing). I think it would easier to add an opt-in for binary genesis and specifically in the context of direction we're going with cross-language modules (see #15410), from the module's perspective I expect genesis will just be a stream of protobuf objects instead of raw JSON. If we can get this design right for ORM, collections, and Rust modules, it will allow binary or JSON genesis. It should be pretty straightforward to go back and forth between the binary and JSON too for editing purposes.
From @aaronc's comment it sounds like this issue is not worth the effort to implement. If so, shall we close it?
idk, I think it is worth doing. I disagree on this one
I feel like it should be clean to:
And then we only apply this annotation to things that can decode cleanly, and/or are perf bottlenecks. The perf bottlenecks are really x/distribution
and ibc
data in what Osmosis does at least.
Which code generator would do this? Honestly seems like a pretty big effort
I guess JSON is good for:
Blockchain genesis Recovery from very bad chain bug / halt
Why? For big genesis, we need tools anyway to modify genesis. Binary encoding is order of magnitude faster for recovery / share. And as @aaronc mentioned, streaming is definitely the way to go to recover of a node of a huge chain (I guess it will be few times faster to do it with an export, rather than loading it all at once ... probably sometimes loading it all at once won't be even possible for huge chains). So optimizing JSON for genesis purpose is a short term solution.
Should be doable with easyjson, and a light modification for timestamps in that repo
we could use https://github.com/TheThingsIndustries/protoc-gen-go-json as it is meant to work with gogoproto
Summary
We currently code generate all of our proto files, to have proto marshal methods. We should also add overridden logic for JSON marshalling. JSON marshalling is done in many situations in the codebase right now:
The first two are performance sensitive. Right now, the JSON marshalling has high RAM usages, because it is based on generic reflection based API's. This takes notable CPU time and heap allocation (the latter causing large GC loads at the relevant scale)
This can all be eliminated if we added code generation for the JSON. We can do so by adding easyjson's codegen https://github.com/mailru/easyjson (or something similar, I have not done a survey of different JSON code generation techniques. But this should be easy add, that would have many wins around the stack.
Proposal
Add JSON marshalling codegen, to our existing proto codegen process. Ensure it gets used / speeds things up with our JSON library usage.