Spike a larger coverage of kt bindings & integrate into existing web5-kt

TBD54566975 / web5-rs

Apache License 2.0

9 stars 5 forks source link

Spike a larger coverage of kt bindings & integrate into existing web5-kt #127

Closed KendallWeihe closed 4 months ago

KendallWeihe commented 4 months ago

The TL;DR is: it's relatively low effort (keyword "relative" because it's still a significant effort) to build Rust bindings into every language if we don't have strong opinions on the syntactical sugar on the other side, and it's high effort to build Rust bindings and also concern ourselves with the syntactical sugar on the other side.

A consequence of binding Rust-to-some-other-language is that the DX expressiveness is highly constrained at the boundary. You can think of it as, every language has it's own unique set of expressive DX features, but the overlap of DX across all languages (in practical terms that means what's made available by the UniFFI UDL file) is minimal. So we can have an expressive DX in our core Rust code, which is then constricted down to a verbose DX across the binding boundary, and then we have to re-flate that expressive experience in the non-Rust language (kt in this case).

We need to make a strategic call as to whether we're going one of two routes:

We take what we build here and inject it into the existing https://github.com/TBD54566975/web5-kt/ project
We leave the old https://github.com/TBD54566975/web5-kt/ project behind and build "web5-kt" independently here, which changes the DX surface area of the existing web5-kt

(the same holds true to other languages)

The obvious choice is (1) but there is likely significantly more work to be done to achieve that relative to (2); which, we can account for, but we need to have better analysis as to the difference in effort requirements.

For this ticket, I can spike in the ground a further build out of the Kotlin bindings, and then spend a few hours trying to use it in the existing https://github.com/TBD54566975/web5-kt/ (AKA rip out the old internals and use the binded code instead)

KendallWeihe commented 4 months ago

I think it's safe to assert that WASM is a secondary priority and shouldn't be considered in the strategy & planning. The focus here is with respect to UniFFI, first and foremost Kotlin & Swift, but subsequently Go and Dart.

nitro-neal commented 4 months ago

~~Agree with 1.~~

We can "piece in" the rust bindings to the existing web5-kt. put in what we have and use existing raw kt for the other

nitro-neal commented 4 months ago

So after a great call with Kendall showing me whats what with the whole flow of rust bindings, Option 2 may be the more practical and "correct" way to do it.

Every SDK is a special snowflake and they all have a ton of extras for each primitive, if we did option 1 the rust bindings would maybe replace only 20% of the code and the value of rust bindings I would say would not be worth it.

Option 2 of creating a full fledged brand new (quick and easy cuz all the hard logic is done in rust) web5-kt2 and web5-swift2 sdks would probably be the 'true' way to use the rust bindings.

I will experiment a bit more and see if option 1 can get more value but yea

diehuxx commented 4 months ago

Even if we decide against creating full-fledged web5-kt2 and web5-swift2, investing in language bindings will be useful in implementing net-new language implementations. We currently have no python, C#, or ruby implementations, and it's reasonable that we may want web5 or tbdex implementations for those languages in the not-too-distant future. Once we have a robust rust SDK and bindings infrastructure, adding new languages will be much lower lift than implementing them from scratch.

decentralgabe commented 4 months ago

I believe we've swung too far in the direction of supporting many languages. What's most important is enabling more languages (as opposed to building and maintaining them ourselves). The question is then, what's the best way to enable multiple languages?

The simplest answer is to say - follow our spec and comply with our test suite and you can build support for web5 and tbDEX however you want.

The more complex answer is to provide a starting point for new language implementations with Rust. This is a thesis, as encouraging (let's say..) python developers to understand how to create Rust bindings to produce a python library may prove more difficult than the simple answer above.

So, is it a realistic approach to use Rust to lower the barrier to entry for new language support? I'm not sure.

The second (and arguably more important) consideration is the maintainability of the SDKs we already have. If we had high confidence that our future feature set would closely resemble our current feature set, I would advocate for changing nothing (no Rust). That is not the case, and I imagine many changes to our SDKs to support new features (like more credential data models), protocols (like OID4VC), and so on. So introducing changes via Rust as a common core for our SDKs may de-risk new feature adoption.

This hinges on Rust solving that problem well, and ask @KendallWeihe is pointing out, it is not so straightforward

it's high effort to build Rust bindings and also concern ourselves with the syntactical sugar on the other side

What we should definitely avoid is a situation where we go from 6 SDKs to 7 or 8 and our maintenance overhead increases: to make a change in 1 language you have to make a change in 2 and deal with ugly bindings. As I view it now we cannot get away from maintaining expertise within our organization for the languages of SDKs we produce, with or without Rust. There is no getting away from spreading ourselves thin in that regard. The question the maintainers of our SDKs need to answer for this work to be a success, is whether Rust makes their jobs simpler or not.

Zooming back out, here is how I would suggest ordering the concerns during this evaluation:

Maintenance overhead (but probably more than this - can we produce consistent conformant, secure, and efficient implementation of our specs)
End user/Developer experience
Enablement of new languages

KendallWeihe commented 4 months ago

I built a visual to help illustrate the matter at hand:

mistermoe commented 4 months ago

IMO, the value of rust bindings is the fact that our core logic is written and maintained in one place vs. having to re-implement dense stuff that's easy to mess up in many languages. a good example of this is converting did documents to and from dns packets for did:dht.

I see the primary purpose of the codegen'd libs (thru bindings) being correctness and not good devex. there are known limitations with the sort of DevEx that can be provided thru codegen and a significant amount of time can be spent attempting to codegen what would be considered an idiomatic api surface in a given target language. This can result in a series of bash scripts and worst of all changing the API surface of the rust core, which i think is a step too far, because we're sacrificing writing idiomatic code in one language in order to provide it in another. Moreover, it's not hard to imagine getting in a place where you're playing whack-a-mole when attempting to produce idiomatic api surfaces for multiple languages via codegen.

I see good DevEx being the responsibility of an sdk written in the actual target language that simply consumes the codegen'd bindings e.g.

This makes it such that:

the rust core can remain idiomatic and the single source of truth for our core logic. we have a bug with did document <-> dns packet conversion? that bug fix happens here and percolates upwards. only has to be fixed in one place.
API surface of codegen'd libs is best effort without bending over backwards and relying on sed / awking our way to success or sacrificing idiomatic rust
we can still provide idiomatic api surfaces in target languages. writing these is significantly easier because we're not re-implementing dense logic. every method effectively becomes:
- accept args.
- transform target args into what binding needs
- call binding
- transform binding result into desired target return structure
- return

Ultimately it would be awesome if we could codegen bindings with 5 ⭐ idiomatic API surfaces in all target languages but i don't think we're there yet.

mistermoe commented 4 months ago

examples of core, bindings, codegen'd lib:

core
bindings. consumes core
codegen'd lib
sugar

jiyoontbd commented 4 months ago

@decentralgabe

As I view it now we cannot get away from maintaining expertise within our organization for the languages of SDKs we produce, with or without Rust. There is no getting away from spreading ourselves thin in that regard.

at first i was confused why this is the case, but this makes more sense with the diagram moe provided in his last comment!

so if i'm understanding it right @mistermoe, you're saying that the work of each language SDK DRIs changes from this flow of:

i implement features for web5-kt with language specific syntax
i update as the spec changes over time

to this:

kendall implements and updates web5-rs as spec changes over time
for e.g. kt, we codegen web5-kt-binding off of core web5-rs change
i consume web5-kt-binding in web5-kt and update syntax / API surface as necessary

i think we can still say that the amount in which we spread our selves thin lessens with the rust work kendall is doing, right?

shamilovtim commented 4 months ago

I think option 2 makes the most sense. Start with the ideal case (complete codegen) and only fall back to a custom wrapper if it's truly needed. But even if it is, that wrapper should flow from the codegen not previous idioms.

Regarding codegen vs. stylistic wrappers I think this shouldn't be a one size fits all approach. For example, the bindings produced for iOS in Objective-C that I've had to use in the past have almost always been straightforward or good enough. Same exact case for Java + Kotlin bindings I've had to use. Neither of them ever needed to be massaged for style despite being codegen. Adding sugar there and maintaining some sort of custom stylistic lib would have been unnecessary work on the part of the authors of those packages. In which case they would have made my life 10% easier but traded it for making their project surface potentially unmaintainable. Depending on how good the codegen is I don't think that it's necessary to maintain sugar in most languages. I think the current discussion is veering too much toward maintaining subjective styles for SDKs when it's not obvious how much of that is going to be necessary in the future. And it's a time pit because style, syntax, idiom really is completely subjective. I expect to follow idioms in my project, I don't really care whether the libraries I use are idiomatic or not. The WASM example makes complete sense and represents no maintenance burden.

RE: What wouldn't make sense: web5-rs is updated and then custom SDKs for web5-lang1, web5-lang2, web5-lang3 all pull the latest web5-rs and update N number of functions, tweaking their params, return types, and so on.

KendallWeihe commented 4 months ago

@mistermoe great visual, I also created an "onion" visual here. The intent here is not to propose codegening the idiomatic code, it's recognized that'll be a matter of handcrafting.

i think we can still say that the amount in which we spread our selves thin lessens with the rust work kendall is doing, right?

Great question! In truth, idk... yet.

The matter at hand is to devise a practical strategy, with respect to leveraging rust bindings, which has a clear and defensible improvement relative to where we currently are. This is a complex problem space so answers are not simple.

As I see it we have three concerns with our current position:

We have more code than engineers (too many SDKs, too little contributors)
Concerns of spec-conformance (we are definitely not spec-conformant in a wide number of areas)
Concerns of correctness

(2) and (3) are both exacerbated by (1). The original focus of this ticket was with respect to (1) because the assumption is we cannot hope to come to (2) and (3) if we're continuously stretched thin.

@nitro-neal @decentralgabe and myself have gone through the practice of writing a "JWK" in rust core, writing the surrounding UniFFI bindings, generating the Kotlin binded code, and then integrating it into the existing web5-kt. The finding is this, we replaced ~10% of the web5-kt source code file. It feels a bit like, "all that for a drop of blood." I know that "measuring programming progress by lines of code is like measuring aircraft building progress by weight" so it's not that lines-of-code is the metric to optimize towards. But I can say, as a person who has been in the depths of this for a month, that's roughly par for the course with my intuitive expectations with respect to retrofitting the existing SDKs with binded code. That said, I would still like to push this ticket further so I can build more evidence, one way or the other. We have a little bit of chicken & egg problem in that we won't know until we do it, hence the "spike."

Fundamentally, what we're up against here is that we are doing this backwards. We started with N-number of SDKs (all with their unique take on things) and are now considering retrofitting, but had we started with bindings and then wrote idiomatic code, then it would be like following a recipe, everything would be consistent, and it would be simple. That's what I'm getting at in the OP:

We take what we build here and inject it into the existing https://github.com/TBD54566975/web5-kt/ project

We leave the old https://github.com/TBD54566975/web5-kt/ project behind and build "web5-kt" independently here, which changes the DX surface area of the existing web5-kt

This may still be worth it. As I said, this is a big problem space with tons of varying optionality. I'm not giving up on retrofitting, at least yet, but I'm trying to consider all vantage points.

frankhinek commented 4 months ago

After discussing this and related issues with Kendall earlier today I'd add a few thoughts to the comments above:

Even if "lines of code" efficiency were the highest priority metric (which I do not believe it is), JWK is not a great representative example given that the spec (RFC7517) defines a JSON data structure representation of a cryptographic key. This results in very minimal algorithmic complexity within the implementation, and as a consequence, not much difference between implementing natively in every language versus using a generated binding. Significantly more complex concepts like the implementation of DID DHT would benefit much more from having a single correctly implemented, spec-conformant, and production grade implementation. To consume in other SDKs and leave all of the logic in web5-rs vastly simplifies the effort needed to support in many languages. It also means that it becomes much more practical for another team of developers to produce their own bindings and support in a language we do not wish to maintain.
We haven't yet exposed our many SDKs to the rigors of production usage and I suspect we will find many opportunities to resolve issues that arise, which will be compounded if we have to update half a dozen SDKs each time this occurs. In terms of getting away from "spreading ourselves too thin", having a single SDK where most of the core logic is implemented (or consumed from other open source projects) should accelerate how quickly we can rollout bug fixes, performance improvements, security patches, etc.
Another aspect is supply chain security. By having a core Rust SDK with a set of open source packages it depends on, we will have a narrower scope of third-party libraries to audit and evaluate.
As Tim pointed out, wrappers aren't always necessary and there may be other teams that have a need to consume functionality for a language we don't support that can directly use the bindings.

KendallWeihe commented 4 months ago

With respect to the matter of retrofitting the existing SDKs, we must first finalize what the web5-rs API design will be (cc @frankhinek). Once that is set in stone, then we can judge, and plan for, where the binded code will fit into the existing SDKs and where it will not. The outcome will not be binary; it will not be the case that "retrofitting with bindings doesn't make sense anywhere" and it will also not be the case that "retrofitting with bindings makes sense everywhere." The outcome will be somewhere in the middle (IMO probably more towards the former, but we'll see).

Then there is the matter of net-new functionality (@jiyoontbd as you raise). Once again, I think case-by-case basis with best judgement. For "dense stuff" (@mistermoe great word for it) it'll be best to do so in Rust and bind it. That seems rational and reasonable. But, it's not straight forward.

One such case comes to mind. Let's imagine we're implementing a net-new PEX feature which has an encapsulated dependency on a call to jwt.sign(). Should that call be the jwt.sign() within the rust core library or to the Jwt.sign() within web5-kt? Probably the former, right? Because we want this new PEX feature to be a simple wrapper around a single call to rust core lib which encapsulates all these lower-level/downstream/transitive dependencies. Okay but what if we haven't retrofitted web5-kt's existing Jwt.sign() function with the rust core implementation? It's reasonable to come to the conclusion, first we must retrofit the existing Jwt.sign() function whereafter then we can implement the new PEX feature (this way we have a single & consistent implementation for signing JWT's within web5-kt; unless we're okay with 2 distinct implementations for signing JWT's within the SDK: the first being the existing Jwt.sign() implementation and the second being the indirect call to the rust core lib's jwt.sign() encapsulated within the new PEX feature). So that would seem the best strategy, but this may quickly become out of hand because... oh shoot, jwt.sign() depends on jws.sign() so then we have to prioritize jws.sign(), and then... oh shoot jws.sign() depends on the cryptographic functions for Ed25519 so now we have to prioritize Ed25519. So then we're at the point of, okay perhaps we can't use bindings for net-new features until first we retrofit pretty much most of the existing feature set.

Which, maybe we're okay with investing the time to retrofit the entire feature set. I'll reiterate, first we need to ossify the DX design of web5-rs whereafter we can make that judgement call.

KendallWeihe commented 4 months ago

There is a requirement here which hasn't been clearly stated, which is, in order to retrofit we cannot induce breaking changes (AKA a major on the semver) for the existing SDKs.

Alright folks, good news, I think retrofitting web5-kt with rust binded Kotlin code is probably feasible. I have successfully binded the most primitive of concepts: the JWK set of features, as well as the highest-level concept: creating, signing, and verifying a VC. We'll have to contort ourselves in some weird ways since we're doing this backwards, and we won't fully capture the value, but yeah I think this is probably feasible. Which means the same will probably hold true for web5-swift.

Everything I have right now is disorganized and difficult to follow. If you want to track the work check out these PRs:

None of this work is intended to merge into main anywhere, so it's just for the spike. I'm at the point now wherein I know it's feasible so now I need to figure out how we actually go about doing the work.

Closing this ticket.