WebAssembly / WASI

WebAssembly System Interface
Other
4.94k stars 256 forks source link

A wasi-cloud-core World proposal #520

Closed Mossaka closed 1 year ago

Mossaka commented 1 year ago

This follows the footstep of #509 and a few discussions in WASI subgroup meetings, we want to create a World for all the wasi-cloud proposals and name it as wasi-cloud-core. This includes

Currently, progress on the above proposals varies in terms of completeness. The proposal specifications for wasi-keyvalue, wasi-messaging, wasi-http, wasi-sql, and wasi-blob-store are more fully formed. In these proposals, some of them are not updated with the WASI preview 2 syntax. By the end of Spring, we plan to add basic proposal specifications for the remaining wasi-cloud proposals and make sure all of the proposal specifications are aligned with the WASI preview 2 syntax. In addition, we want to validate all the WIT files using automated CIs and wasm-tools, and document any breaks and changes to the specifciation in a change log.

To elaborate on aligning the syntax of WIT with that of WASI preview 2, we want to use pseudo-stream/future/resource types, and to continue to align with future versions of WASI, as described in #515.

Each of these proposals have its own proposed WIT interfaces and worlds, but we raise this issue to propose a wasi-cloud-core World that has a similar structure to the following:


default world wasi-cloud-core {
  import wasi-keyvalue: wasi-keyvalue
  import wasi-blob-store: wasi-blob-store
  import wasi-sql: wasi-sql
  import wasi-messaging: wasi-messaging
  import wasi-runtime-config: wasi-runtime-config
  import wasi-distributed-locking: wasi-distributed-lock-service
  import default-upstream-HTTP: wasi-http.outgoing-handler

  export HTTP: wasi-http.incoming-handler
}

This is just a sketch of the proposed world, and much remains unknown to what it means to import a wasi-keyvalue world inside of another world, as imports and exports of a World only allow WIT interfaces. We will also explore the uses of WIT templates in each proposal to help us move runtime implementation to static time implementations.

A note on wasi-cloud-core overall scope

It is not meant to cover 100% of the features that distribtued application expecs, but it focuses on the 80% of the problem space with the assumption that most apps will fall into this scope. The API designs for these wasi-cloud proposals aggregate the common features across multiple providers. An example would be that wasi-keyvalue provides readwrite APIs which is the lowerest common dominator in all systems and is designed for rapid development expeirence.

technosophos commented 1 year ago

All of these make sense to me except for gRPC, which feels out of place and niche.

While gRPC gets used in the Kubernetes world, I can't think of many other contexts where it gets broad use. I can't think of any tier-1 cloud providers that require gRPC for communication (and even Kubernetes uses it only in very low level contexts that most developers never see) or which offer services that require gRPC to be able to use them. Additionally, there are many competing technologies that do what gRPC does, and there are good reasons for adopting those. Additionally, gRPC is particularly fragile and prone to breaking changes and incompatible implementations.

The rest of the proposed bindings refer to tremendously well-adopted "table stakes" features, and provide general purpose implementations.

Is there a powerful reason to include gRPC in what is otherwise a generic set of "table stakes" features? Otherwise, it feels like requiring it may prove to be a hinderance for adoption, especially in ecosystems that have no use for it or actively avoid it.

Otherwise, I'm pretty excited about this proposal.

technosophos commented 1 year ago

Also, I love the name.

sbc100 commented 1 year ago

BTW, where does this name come from? Perhaps I am out of touch with the world of cloud and this is something most folks will immediately understand? (I tried googling "cloud" with "bursty" but it didn't seem helpful).

technosophos commented 1 year ago

My assumption was that it came from "bursty workloads" -- e.g. the kind of unit of work that handles computing in short-lived bursts rather than constantly. Examples: handling an HTTP request, responding on a message queue, listening for an event, and so on.

This would include patterns like:

Opposed to bursty workloads would be anything that is expected to run continuously, or to handle very large operations that consume hours rather than seconds.

Again, though, this is my reading of a name that @Mossaka came up with. So he may drop in here and tell me I'm entirely wrong.

squillace commented 1 year ago

it could be bursty workload, too, easily

Mossaka commented 1 year ago

Is there a powerful reason to include gRPC in what is otherwise a generic set of "table stakes" features? Otherwise, it feels like requiring it may prove to be a hinderance for adoption, especially in ecosystems that have no use for it or actively avoid it.

Just to clarify, wasi-grpc is a proposal that hasn't been materialized yet, and this is the reason why I didn't put wasi-grpc in the bursty world proposal. AFAIK there are two primary modles for service to service/client communications: REST (HTTP) and RPC (source: Designing Data-Intensive Application chapter 4). Over the years, RPC has been evolved and among them, gRPC emerges as a RPC framework that uses Protocol Buffers encoding schemes. It could be argued that what we really want to abstract is the RPC model instead of gRPC. But what's fascinating about gRPC is that it supports streams communication.

Back to your question. Yes I agree that gRPC has compatibility, complexity, and performance issues. Some older system may not support the latest gRPC frameworks or some langauges are not supported by the gRPC codegen. But I still see values in wasi-gprc proposal itself, albeit it may have a hard time integrating itself into the bigger world unlike other wasi-cloud proposals. Does this sound reasonable to you?

Mossaka commented 1 year ago

Again, though, this is my reading of a name that @Mossaka came up with. So he may drop in here and tell me I'm entirely wrong.

The name is contributed by @squillace . It really is saying that this world is designed for serverless / edge function model.

sbc100 commented 1 year ago

Again, though, this is my reading of a name that @Mossaka came up with. So he may drop in here and tell me I'm entirely wrong.

The name is contributed by @squillace . It really is saying that this world is designed for serverless / edge function model.

But is "bursty" a common word that used in the serverless / edge world (no pun indented)? I've never heard of it before, but maybe thats not surprising because its not an area I work in often.

If its not a common word, then would it make sense to use a word that is more commonly known? Why not just cloud?

fibonacci1729 commented 1 year ago

Historically there have been conversations about supporting an include <world> WIT syntax to enable the union-ing of worlds supplemented by a with syntax to reconcile name conflicts. Sounds like this proposal is a good motivation to formalize some of those past ideas!

squillace commented 1 year ago

Again, though, this is my reading of a name that @Mossaka came up with. So he may drop in here and tell me I'm entirely wrong.

The name is contributed by @squillace . It really is saying that this world is designed for serverless / edge function model.

But is "bursty" a common word that used in the serverless / edge world (no pun indented)? I've never heard of it before, but maybe thats not surprising because its not an area I work in often.

If its not a common word, then would it make sense to use a word that is more commonly known? Why not just cloud?

no, @sbc100, bursty world is not a term of art, but rather a way to specify "fast-firing functions". The pattern would be "serverless" generally, but unlike that term it does not imply any process/module hangs around much at all. Hence, "bursty" rather than functions or serverless which don't really scope based on time of execution.

the cloud world is one I wanted to avoid, because the major cloud providers have 120+ services available, and that is definitely not "scope" :-)

That said, as naming is considered hard, just trying to drive in on a catchy but metaphorically appropriate name for the scope of work..... we could say "wasi-fast-functions" for example, but somehow it just isn't as suggestive.....

sbc100 commented 1 year ago

I see, thanks for the explanation. I agree that "cloud" is not the right name for the thing your are describing.

It sounds like "bursty" might not the right name either though since (to me at least) its doesn't imply stateless-ness or throwaway-ness or lightwight-ness (which seem to be what you trying to get at?)

Can you explain more why you don't like "serverless"? To me it does imply those things, and I'm not sure what you mean by "does not imply any process/module hangs around much at all". I thought serverless kind of does imply that. Is the idea with this proposal that the process/module would, by definition, not stick around for more than one request?

Of course we don't need to all the bike-shedding here and now.. and as you say, naming is hard.

lukewagner commented 1 year ago

I like the terms "bursty" and "serverless" since they're both evocative of these ephemeral, quick-starting, auto-scaling-to-zero instances that I think a lot of us are imagining. That being said, there are a lot of different worlds that these qualities will apply to, including the existing wasi-http proposal. Moreover, with the wasi:http/proxy world (and I think the same logic applies here), there is no reason that a host must make instances short-lived or auto-scaled; the host can deliver as many or few events to a given instance as it wants and keep them alive/warm for as long as it wants (or likely make this configurable by per deployment). Thus, while I think "serverless" and "bursty" are qualities we want these worlds to have, I don't know if they are the defining properties of a single world.

In general, I'd suggest that we try to think of world names that describe the collection of functionality that is being exposed. That's tricky with the large set of functionality that Joe listed above. Just to throw in my 2c: one term that is admittedly amorphous, but maybe in the right way (given that we expect this set to grow over time), could be "service". If service feels overly general, an adjective that maybe makes sense is "persistent" (as in wasi-persistent-service), since a theme across messaging, kv, blob and sql is that these are all forms of persistent storage. ¯\(ツ)

squillace commented 1 year ago

no problem bikeshedding on a friday. That's what friday's are for. the main conceptual boundary I'm trying to draw is between serverless generally, which can include very long running processes and thus remain alive for minutes or more, and fast firing functions, which almost certainly do not live much longer than a minute.

The former encompass "durable" functions that can be suspended and reanimated and the latter are essentially the generalized case of "CDN functions" that have if not milliseconds then only seconds to live, possibly to a minute. There are domain differences between the two approaches, though the design pattern seems the same. In bursty workloads, one of the main points is recycling the resources essentially or literally per request, which means resource efficiency does not require threading or connection management and so on.

serverless, however, might include precisely these things, depending on how they are being used. OR.... so I have been thinking. But as I said, bikeshedding naming on a friday is a good thing, so all thoughts here are flexible.

squillace commented 1 year ago

related to @lukewagner's comment that just sailed in, yes, I think persistence is a thing that might place some shapes in a different "domain" and hence world. OH! and to give even more context: with respecto bursty worlds, the list of capabilities here reflect what our customers are asking for in this precise world they describe. They say this is the 80% sweet spot -- more isn't needed.

This doesn't mean we cannot add or remove any, but just to give context for the source of the choices.....

programmerjake commented 1 year ago

if you're going to include RPC, why not use https://capnproto.org/rpc.html afaict it's more powerful than gRPC...though tbh the fact that there are multiple choices for which RPC protocol to use indicates to me more that none of them should be included by default

sdeleuze commented 1 year ago

Thanks for sharing this proposal.

GRPC

I am also strongly in favor of removing GRPC from the scope.

GRPC looks out of place from my POV for multiple reasons. GRPC is a popular but specific and opinionated implementation of an RPC mechanism. I would expect such WASI world to focus provide abstraction that can allow to plug various implementations (like the other proposals listed here).

And isn't the abstraction for RPC systems WASI Component Model itself? Sure each RPC systems could require some specific parts, but again not sure we have to put those specific in such general purpose proposal. Since wasi-grpc has not yet materialized, I am not sure we can make an educated choice of including it.

Name and scope

While we are I think a majority of stakeholders planning to implement Serverless platforms with the proposals listed above, I am not sure this world is Serverless specific. The proposals listed looks like suitable for implementing long running workload as well, and with the generalization of scale to zero capabilities, I am not sure we will be able to draw a clear line between those 2 use cases, and we don't necessarily have to.

Also maybe better to use a name that can be understood without too much context. I find wasi-command or wasi-cli self descriptive and easy to understand, and I have concerns that a lot of people could be confused by wasi-bursty even if I find the name pretty cool.

While I have not yet concrete proposal for a better name, wasi-cloud looks like the more straightforward name I could think about given the pretty broad scope proposed. Could somebody share the rational behind not using it despite the description mentioned by @Mossaka being "we want to create a World for all the wasi-cloud proposals"? Is there concerns that it looks like too traditional Cloud vendor-ish or that Cloud could become an outdated term at some point? Edit: I initially missed @squillace feedback on concerns about cloud being too broad scope, point taken.

I tend to agree with what @lukewagner said on how a name should be chosen, but the current pretty broad scope make it hard to choose a meaningful name. A subset of those proposals could maybe be called wasi-persistence-world or wasi-data-world but the current scope proposed is much wider than that.

If we, as a group, are sure that the current scope (hopefully minus GRPC) makes sense and that a consensus between different vendors providing multiple implementations is possible, wasi-service is maybe a bit generic. wasi-remote-service could maybe be an option but not sure yet.

sdeleuze commented 1 year ago

Something like wasi-cloud-core could also potentially be a reasonable option in order to take in account @squillace concern on the too broad scope implied by wasi-cloud.

This name is IMO self descriptive, and clearly communicates the intent of providing a minimal set or core cloud services that providers should implement, likely providing more on top of that.

Here core is used as a profile, and let the possibility to provide different ones that could be even supersets later. The key point being that different stakeholders like Microsoft, Fastly, Fermyon, Cosmonic, VMware and others agree on a minimal core that will make sense in most cases.

The scope would be the one originally proposed minus grpc.

In practice, most WASI Cloud platforms will be Serverless and "bursty" given the characteristics of Wasm, but wasi-cloud-core name would be relevant for any kind of runtime/billing model made possible by the proposed API surface.

Mossaka commented 1 year ago

I like the idea of wasi-cloud-core and can confirm that grpc will not be in-scope.

Mossaka commented 1 year ago

I have updated the proposed world name to wasi-cloud-core and removed gRPC from the scope.

Mossaka commented 1 year ago

Using the proposed syntax from Proposal: Union of Worlds, the new wasi-cloud-core World would look more like the following

world wasi-cloud-core {
  include wasi-keyvalue
  include wasi-blob-store
  include wasi-sql
  include wasi-messaging
  include wasi-runtime-config
  include wasi-distributed-lock-service

  import default-upstream-HTTP: wasi-http.outgoing-handler

  export HTTP: wasi-http.incoming-handler
}
squillace commented 1 year ago

booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

but I defer to the wisdom of crowds

sdeleuze commented 1 year ago

Looks great, easy to understand and likely to reach wide adoption from my POV.

steren commented 1 year ago
  1. The name cloud: "cloud" indeed carries a lot of baggage, and the "core set of features" to expect from a "cloud provider" will vary depending on who you're asking and what kind of workloads you are trying to move to this cloud. A name that wouldn't carry that much scope is preferred. Looking at the set of interfaces suggested, it looks like this world is centered around http serving? What about wasi-serving?

  2. The proposal seems to be missing some logging interface. I'd expect to see something like include wasi-logging. Is this implicit?

  3. How can we avoid scope creep? if the goal is to define something that is "core", its set of interfaces should probably be kept to a minimum and heavily justified. Should SQL really be considered "core"? Could the core be kept super minimal, and bigger worlds created on top of it? Maybe the core only has the following:

    default world wasi-serving-core {
    include wasi-logging
    import default-upstream-HTTP: wasi-http.outgoing-handler
    
    export HTTP: wasi-http.incoming-handler
    }
squillace commented 1 year ago

Hi @steren, darned good questions all. I'll take one swipe, let's see what others say:

  1. cloud does carry baggage, you're right. The intent was to have the smallest set of "service shapes" available for the widest array of serverless functions at first (the "core" part). I had wanted to call it "bursty world" but was voted down as the shape doesn't convey speed of the surface. So "serving" would be fine, but is there a tighter way to express the intent here? OR.... is the shape the shape, and the intent is something that appears in implementations?
  2. Yeah, the way we're thinking of this is that wasi-logging is part of the overall wasi space and is DEFINITELY assumed -- but isn't really part of this surface. The way we've discussed it is that any implementation would compose wasi-cloud-core and wasi-logging (however that ends up being surfaced) and thence you would always have it (for most cases anyone would think of).
  3. With you here. Most chewing on this had decided that usage meant that both kv and sql were of equal usage in the space, hence they're both here. But because we could compose, they could easily be separate. However, doing so would mean the usability dropped from something like using wasi-cloud-core to using quite a few things individually.

That said as a first response, these are all questions we're noodling our way through using this space as a working space.

Mossaka commented 1 year ago

I agreed to what @squillace said above. I want to elaborate more on the second point and the third point.

I'd expect to see something like include wasi-logging. Is this implicit?

I think it is implicit. The way the include works is that it adds the imports / exports from that world to this world, and hence the name "union of worlds". wasi-logging is so fundamental that many of the smaller worlds already assume it as a dependency. e.g. the wasi-http has a dep on wasi-logging.

Do you prefer we explictly add wasi-logging to the wasi-cloud-core world? This is fine because the include de-duplication resolver will figure out that wasi-logging has been transiently added by other worlds that are included in this world.

How can we avoid scope creep?

I like the idea that any interfaces that are included in the wasi-cloud-core needs to be heavily justified, but I want to point out there is a world that fits exactly what you described - the http proxy world is. In my view, wasi-cloud-core is broader than a proxy world. It gives a set of "common" capabilities to developers to build distributed applications and these include the ability to interact with keyvalue stores, upload and download files from a blob store, exchange messages through pub/sub, and retrieve runtime configurations from vaults etc.

steren commented 1 year ago

wasi-http has a dep on wasi-logging.

Great, that was my question. I now understand that import default-upstream-HTTP: wasi-http.outgoing-handler imports wasi-logging. I don't think it has to be explicitly listed as an import. Maybe having a tool that would list all inherited interfaces of a world would be useful.

I want to point out there is a world that fits exactly what you described - the http proxy world is

Thanks, this indeed matches what I would consider "core" to request serving.

wasi-cloud-core is broader than a proxy world. It gives a set of "common" capabilities to developers to build distributed applications

That makes sense. Still, I suggest keeping a "core" to the bare minimum. A principle could be something along these lines: "80% of distributed apps are expecting this interface". I would thus suggest to move wasi-runtime-config and wasi-distributed-lock-service out of core, maybe to a broader wasi-cloud-extended world (Unless you have strong evidence that the large portion of developers will expect these, based on my experience, most apps are fine without these capabilities)

revmischa commented 1 year ago

Perhaps something like:

wasi-runtime: wasi-logging wasi-runtime-http: wasi-runtime, wasi-http wasi-runtime-storage: wasi-runtime, wasi-keyvalue, wasi-blob-store, wasi-sql wasi-runtime-messaging: wasi-runtime, wasi-messaging

wasi-cloud-core: wasi-runtime

I imagine that I would want to run the same services on my laptop as I run in the cloud, so the idea of a "runtime" encapsulates both of those worlds which could be defined as the same thing for now but may diverge.

(also "runtime" is similar terminology to AWS lambda, where I would love to have a WASI runtime some day),

Mossaka commented 1 year ago

A principle could be something along these lines: "80% of distributed apps are expecting this interface".

I really like this idea that we want to design interfaces to have 80% of the features for working distributed apps. In fact, the Pareto principle is one of the guiding principles when we started the SpiderLightning project, which is a prototype for host implementation of many wasi-cloud-core proposals including keyvalue, messaging etc.

I would thus suggest to move wasi-runtime-config and wasi-distributed-lock-service out of core

I first want to explain the use cases for these two capabiltiies and then see if we agree that these two are fitting the 80% feature sets of distributed apps:

I'd like to hear stakeholders views on whether or not we want to have these two capabilities baked in the wasi-cloud-core World.

Mossaka commented 1 year ago

I imagine that I would want to run the same services on my laptop as I run in the cloud, so the idea of a "runtime" encapsulates both of those worlds which could be defined as the same thing for now but may diverge.

Agreed, and in SpiderLightning, we have used OS filesystem to implement wasi-keyvalue, wasi-messaging capabilities and the same application that uses these two capabilities can run in local with filesystem, or can run in production environment with cloud providers.

I am a bit afriad to use "wasi-runtime" as runtime is common referred to Wasm runtimes like Wasmtime and WAMR etc.

Mossaka commented 1 year ago

This proposal has been accepted to move to a stage-1 WASI proposal. A repo is created under WebAssembly org and here is a link. I will be working on formalizing the spec using the WIT IDL and adding more description to it.

I will be closing this issue and encourage everyone to discuss wasi-cloud-core in that repo.

Thank you all for your support and suggestions!