cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.18k stars 3.58k forks source link

Make proto-codegen also create JSON marshalling code generation #14146

Open ValarDragon opened 1 year ago

ValarDragon commented 1 year ago

Summary

We currently code generate all of our proto files, to have proto marshal methods. We should also add overridden logic for JSON marshalling. JSON marshalling is done in many situations in the codebase right now:

The first two are performance sensitive. Right now, the JSON marshalling has high RAM usages, because it is based on generic reflection based API's. This takes notable CPU time and heap allocation (the latter causing large GC loads at the relevant scale)

This can all be eliminated if we added code generation for the JSON. We can do so by adding easyjson's codegen https://github.com/mailru/easyjson (or something similar, I have not done a survey of different JSON code generation techniques. But this should be easy add, that would have many wins around the stack.

Proposal

Add JSON marshalling codegen, to our existing proto codegen process. Ensure it gets used / speeds things up with our JSON library usage.

testinginprod commented 1 year ago

There's one requirement: codegen needs to be protojson compliant. So I don't think just using EasyJSON is going to settle it, we would most likely need to fork and adjust it to handle certain proto types Any, Timestamp, Duration in a certain way.

aaronc commented 1 year ago

We also really want the encoding to be deterministic

ValarDragon commented 1 year ago

Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json

Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems)

aaronc commented 1 year ago

Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json

As far as I can tell that library doesn't do anything for performance. It's just a thin wrapper for encoding/json

Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems)

I'm hopeful that streaming genesis will improve the memory issues although performance would likely be similar. What if we allowed streaming binary protobuf genesis as an option?

ValarDragon commented 1 year ago

Your right that library doesn't help :(

I think keeping the genesis JSON native, and just figuring out how to codegen the proto JSON serialization is worth doing.

I think it should be pretty low overhead to make a second tool to do JSONcodegen with:

As long as we can add it as another second step, rather than editing the existing code. (queue the meme "# compiler passes = # of teams working on project")

ValarDragon commented 1 year ago

I guess JSON is good for:

Protobuf is fine for the case thats more performance sensitive, which is taking a chain export, editing some value, and starting a testnet. Its already really painful to do this with the JSON tho, I think having to make new protobuf tools will make this unusable short term without serious investment in improving that.

I think just getting the JSON codegen'd will be a bigger win

aaronc commented 1 year ago

I think JSON codegen will be a huge effort and this does seem like an optimization for an edge use case (when there's a big chain and export is needed for testing). I think it would easier to add an opt-in for binary genesis and specifically in the context of direction we're going with cross-language modules (see #15410), from the module's perspective I expect genesis will just be a stream of protobuf objects instead of raw JSON. If we can get this design right for ORM, collections, and Rust modules, it will allow binary or JSON genesis. It should be pretty straightforward to go back and forth between the binary and JSON too for editing purposes.

elias-orijtech commented 1 year ago

From @aaronc's comment it sounds like this issue is not worth the effort to implement. If so, shall we close it?

ValarDragon commented 1 year ago

idk, I think it is worth doing. I disagree on this one

I feel like it should be clean to:

And then we only apply this annotation to things that can decode cleanly, and/or are perf bottlenecks. The perf bottlenecks are really x/distribution and ibc data in what Osmosis does at least.

aaronc commented 1 year ago

Which code generator would do this? Honestly seems like a pretty big effort

robert-zaremba commented 1 year ago

I guess JSON is good for:

Blockchain genesis Recovery from very bad chain bug / halt

Why? For big genesis, we need tools anyway to modify genesis. Binary encoding is order of magnitude faster for recovery / share. And as @aaronc mentioned, streaming is definitely the way to go to recover of a node of a huge chain (I guess it will be few times faster to do it with an export, rather than loading it all at once ... probably sometimes loading it all at once won't be even possible for huge chains). So optimizing JSON for genesis purpose is a short term solution.

ValarDragon commented 1 year ago

Should be doable with easyjson, and a light modification for timestamps in that repo

tac0turtle commented 2 months ago

we could use https://github.com/TheThingsIndustries/protoc-gen-go-json as it is meant to work with gogoproto