Cross-language support - Githubissues

KrishnaPG commented 4 years ago

This looks good work.

For cross-language support, it would be good to go deep into one language with runtime support, rather than going wide and providing compile-time linking options for many languages.

For example, once C/C++ implementation is done, please:

provide connectivity options to invoke the API through HTTP/2, WebSockets, RPC etc. so that all other languages, such as JavaScript, GO, Rest etc. all can start using the C/C++ code at runtime.

That would save lot of implementation redundancy, as well as reaches larger target audience (e.g. Python, R what not)

Also, for serialization, please use cbor rather than depending on protobufs. A good design does not force a rebuild of the code whenever there is a change in the wire/serialization format. All modern languages support CBOR and it is extensible.

igormcoelho commented 4 years ago

Hello @KrishnaPG, thanks for the comments, I believe we are aligned in the same direction. @rodoufu do you know CBOR?

provide connectivity options to invoke the API through HTTP/2, WebSockets, RPC etc. so that all other languages, such as JavaScript, GO, Rest etc. all can start using the C/C++ code at runtime.

Regarding this point, we are using gRPC... so, it works over HTTP, although "not pure". Is that the point? I've been looking at zero-MQ as well, specially to build p2p interconnections, but not certain yet. Next step here is to provide better WorldModel supports and active monitoring of processes (to begin heavy testing).

KrishnaPG commented 4 years ago

Thank you @igormcoelho

The gRPC should be good enough to interact from other languages. But it was not directly apparent from the ReadMe how to use that gRPC API. It would be great if you could kindly add an example or two in the ReadMe, or some docs, showcasing how to use the libBFT's gRPC (preferably from JS or some simple language), so that it becomes easy for us to start using/testing it.

For example, I would love to invoke the gRPC API from browser (through JavaScript) and start using this library for sharing some state data across multiple NodeJS and browser instances. (If you can run from a browser, then you are already mobile-ready). JS is the low-hanging fruit. If it can be used from JS, then using it from other languages, such as Go, Rust is pretty straightforward.

For P2P, ZeroMQ is good, but it has some serious restriction of native sockets, which means browser-based clients are out of question. There were some efforts to use ZMQ with WebSockets earlier, but not much progress there. The best option out there currently is libP2P

It supports websockets (means browser-ready + of course native sockets as well)
Has support for Kademlia routing, mDNS etc.
- Supporting Kademlia means, DHT based RPC, such as this are easy. Free P2P RPC (with UDP hole-punching, which ZeroMQ cannot currently do, IIRC)

Monitoring is the easy part. You might be already aware of below, which are the best options out there currently:

Net Data, for single machine resource usage monitoring
Jaeger for distributed tracing across multiple processes / servers / data centers

The rather difficult part, however, is making the distributed state management easy for programmers.

When you say "WorldModel" I believe you are referring to the app-specific state data that is being shared by nodes. My request would be to make such "app-state" data API robust with cross-language compatibility so that applications from any language can seamlessly use the same data-model across diverse machines / networks (including browser-instances).

For example, how easy it is for one to start with a custom state data model and evolve it over a period of time. I.e. no schema lock-in (this is where CBOR could greatly help)
How easy it is for one to join an existing network of BFT nodes and start participating in the distributed state sharing (read / write). I.e. no network topology lockin (where DHT can greatly help)
How easy it is to audit the validity of the current state (or any previous state)?
How easy it is to use custom storage? Can one use AWS S3, local files, IFPS, or a combination of all to story my distribute state data? i.e. no storage lock-in

A couple of examples that illustrate how to achieve such above tasks easily from simple languages, such as JS could immensely attract many developers to start using this project rapidly.

The Tendermint ABCI is a good model to study for inspiration. It is a consensus engine, and the ABCI spec is good. But its implementation still needs much more refinement. For example, multi-tenancy etc. Here are some of my thoughts https://github.com/tendermint/tendermint/issues/4058

Essentially the Tendermint has a clean separation of application state data (WorldModel ?) and the blockchain consensus data. This makes it easy to program the state management. For example, LotionJS
However, somewhere down the line, they dropped the ball and it is very difficult to control the nodes programmatically. The implementation focused too much on interactive CLI based usage, it became hard to use the engine from anything other than one particular language/environment.

Since libBFT is still in the developments, there is great scope to make it easy for programmers to adopt and get started easily.

rodoufu commented 4 years ago

Hello @KrishnaPG, thanks for the comments, I believe we are aligned in the same direction. @rodoufu do you know CBOR?

I haven't used CBOR yet, but I started to look about it once I read this issue from @KrishnaPG, we are using Protobuf because it's the default for gRPC and it's also well known and has good performance. @KrishnaPG what are the advantages of CBOR over Protobuf?

rodoufu commented 4 years ago

That's a good suggestion @KrishnaPG, I was going to cite libp2p as well. @igormcoelho, libp2p is the one I talked to you about, I saw a good presentation on the Devcon V about it, @shargon and @belane also liked it.

For P2P, ZeroMQ is good, but it has some serious restriction of native sockets, which means browser-based clients are out of question. There were some efforts to use ZMQ with WebSockets earlier, but not much progress there. The best option out there currently is libP2P

KrishnaPG commented 4 years ago

Thank you @rodoufu

what are the advantages of CBOR over Protobuf?

The main distinction between them is the schema-lockin.

Protobuf is schema-driven:

If you do not know the schema of the other party, it becomes unusable.
You know the schema of the other party, but they changed it recently - it becomes unusable

A decade back when everyone was building applications inside an organization, documeting and sharing their interfaces / schemas, protobuf was a super-hit, because you can consume RPC created in one language from another. These were the days XML was a super-hero (and C/C++ guys usually hated XML), so protobuf became a natural alternative.

Fast-forward few years, where web and mobile became standard, and Javascript emerged as one of the main languages (thanks to the Node.Js and full-stack), which completely destroyed the XML reign, with its super simple JSON. (XML namespaces were a nightmare)

You no longer need to know what the other party is sending. You can serialize, and deserialize objects with arbitrary keys without having to ask the other party what they are sending. REST API became a boom.

CBOR is just a binary packing of JSON. One can think of Protobuf vs CBOR as C++ Templates vs .Net Reflection.

Protobuf creates a maintenance night-mare in the long run, even within a single organization if systems are continuously evolving.
Especially for distributed frameworks, such as this libBFT, where components have to interact with third-parties, discovering the schemas on-the-fly is an implicit unstated requirement.

If one looks at the extensibility of CBOR, for example the tags here, one can discover the provision for IPLD, IOT and other emerging technology tags, which are "semantic markups" of data that are very important for Blockchain frameworks such as this.

For example, consider a simple practical scenaro:

an IOT sensor sending sensor readings to a third-party subscriber over pub-sub channel for data analysis (e.g. weather prediction),
and the analysis/prediction results are shared with a dynamic set of down-stream nodes that are making decisions, such as market pricing based on the upcoming weather in the next few days
and for ensuring tamper-evident data sharing, Byzantine fault-tolerance is used between the analysis server and the downstream nodes

in the above scenario, sensor -> analytics server -> decision nodes, no one knows anything about the other servers. While this can be achieved with protobuf, with lot of coordination, it is certainly not the best tool for the job.

Now, one may ask, without knowing the other party's schema how can CBOR be beneficial either? Since at the end of the day, you have to know the fields in the object to make anything useful out of it. There are two aspects to it:

It is true that one has to know the fields in the data to make anything useful out of. But this does not apply for middleware, such as this libBFT, where the exact state / data is not going to affect the functionality (e.g. what is stored inside the block is never our business).
Also, this is where the concepts of Linked Data and multi-formats shine.

KrishnaPG commented 4 years ago

BTW, on the downside, using CBOR requires you to have your own RPC network mechanism.

gRPC was one of the reasons for the success of Protobuf. You get both the RPC framework + serialization (protobuf) nicely working together.

CBOR is a good option only if you already have a separate RPC mechanism, such as ZMQ / DHT-RPC / Websockets or least a HTTP server, such as H2O or SeaStar.

NeoResearch / libbft

Cross-language support #2