envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.98k stars 4.81k forks source link

Envoy WASM extensions in the present and its future (Proxy-Wasm) #35420

Open marc-barry opened 3 months ago

marc-barry commented 3 months ago

Title: Envoy WASM extensions in the present and its future (Proxy-Wasm)

Description:

Envoy current supports WASM extensions via the WASM filter. I am aware of the following warning:

The Wasm filter is experimental and is currently under active development. Capabilities will be expanded over time and the configuration structures are likely to change.

The documentation for the feature is fairly terse and I largely used articles like https://tetrate.io/blog/wasm-modules-and-envoy-extensibility-explained-part-1/ and there was Google Document proposal that I had read a while back but I'm now unable to find the document. We have hit a number of issues in terms of the documentation, extension development process and clear understanding of the adhered ABI spec and plans for that.

There are many more references across the Internet with reference to WASM extensions for things that use Envoy under the hood or have decided to also adopt it as its ABI. But as I mentioned above the spec isn't really even defined in the public space and all work on it appears abandoned or stalled.

What I'm trying to determine is the following:

Relevant Links:

phlax commented 3 months ago

@marc-barry i would be happy to review any improvements to the docs - sounds like we have some gaps in current docs

cc @mpwarres wrt substantive questions

marc-barry commented 3 months ago

@phlax thanks for answering. I also don't mind helping improve this in the community and contributing. Perhaps we can start with documenting the ABI interface details that Envoy uses so that developers can reference this when developing extensions. I couldn't find any examples for Envoy and perhaps I could contribute a simple example that we could include.

https://www.envoyproxy.io/docs/envoy/v1.31.0/api-v3/extensions/wasm/v3/wasm.proto#envoy-v3-api-msg-extensions-wasm-v3-pluginconfig is documented quite well but the gap is, for example, from say Go code to a WASM plugin that can be loaded. If you navigate down to allowed_capabilities you'll see a reference to The capability names are given in the [Proxy-Wasm ABI](https://github.com/proxy-wasm/spec/tree/master/abi-versions/vNEXT). but if you follow that link it no longer exists. I think we could focus on cleaning this up which would make it more clear. We should probably figure out how to improve the https://github.com/proxy-wasm contribution situation as this is then something that the Envoy docs could just reference.

phlax commented 3 months ago

yeah, makes sense

we have this example https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/wasm-cc, i tried previously to add something similar for rust but didnt get too far

@lizan is not as active as before but might have ideas about who to speak to - i know there is quite a bit of commitment on the google side to develop/maintain the wasm filter

mathetake commented 3 months ago

@mpwarres @martijneken are the current owners of Proxy-Wasm and Envoy codebase, so I am truly hoping they share what direction this goes and how the evolution looks like - as a former maintainer there, I wanted to express my sorry here for the mess and I actually failed to bring it to the healthy state. I completely share the frustration that I sense you have. I hope the folks mentioned ^^ clarify their stance and how this will be resolved, especially Google's point of view on this matter. cc @alyssawilk

mathetake commented 3 months ago

I think one idea is to eliminate the Proxy-Wasm organization dependency completely, and make the Envoy codebase self-contained. After that, host the spec and complete user-level documentation here in envoyproxy.io. Plus, host the SDKs here as well (we are more than happy donating the Go SDK). Given that there was no inter-proxy collaboration per-se since the release, there's absolutely zero benefit to Envoy in having the spec separated from Envoy at this point. pure technical debt is from my view the consensus from the conversation with other community members in the last few weeks. That's my take. That's just a random idea, but having the central place to look at is as a user myself is less confusing as well as better IMHO.

mathetake commented 3 months ago

^^ if this sounds good to other maintainers, I am more than happy to help and maintain again - I really feel the obligation to fix this once and for all

kyessenov commented 3 months ago

I am happy to review any doc improvements since I'm familiar with the extensions as well. A lot of the base code is used in production, for real products, so there's definitely a valid use case, but it is difficult to figure out where the stability ends and the rough edges start without looking at the code. The larger problem is that it is simply hard to start writing Wasm no matter which language one chooses (not everything will learn Rust for changing headers), so there's a barrier of entry that the improved docs may not fully address.

I think the current Wasm implementation is no longer experimental and reached some stability (for core parts), but it also failed to reach 1.0. It doesn't really matter what the version number says, since in practice, there's just one ABI used by various Proxy-Wasm efforts.

mathetake commented 3 months ago

yeah I completely agree with @kyessenov, and in order to improve the situation (not saying I am sure how the end stability means here), I think as @marc-barry hinted (and I believe everyone is aware), the unnecessary dependency on the Proxy-Wasm org makes the situation worse or standstill. What I am suggesting is to document and host everything in envoyproxy.io so there is just one single source of truth. This is not only about the documentation, but also how the ecosystem around it works like who's responsible for what, how's the issue handled, where's the place to report issues, what's the support policy etc. Currently all of them is a mess. But open to suggestions, and curious how others think about the coupling with Proxy-Wasm and leaving everything there as-is benefits Envoy just by looking at the history

thenewwazoo commented 3 months ago

I'm an onlooker, but thought I might weigh in. My employer is interested in leveraging existing logic written in Rust (or in porting Java code to Rust) in order to embed it in multiple places, one of which is Envoy filters. We benchmarked C++, Lua, and Rust WASM (using the v8 runtime) and found the overhead of WASM to be a show-stopper. I have been experimenting for the last ~week with building NullVM filters to compile into v1.30 but ran into this problem after some significant struggles bringing everything up-to-date, and have stopped trying.

I'm not opposed per se to proxy-wasm, but I'm doubtful as to its value given that adoption has been very poor over the last ~4 years since it was introduced. The overhead of a "real" WASM runtime is too high for us, and NullVM has (as far as I can tell) never quite been finished (and its overhead is still to be measured). As such, I'll be exploring the recent work on loadable modules. IMO that's where development effort should be directed.

kyessenov commented 3 months ago

@thenewwazoo The criticism is shared but it's an ecosystem problem with Wasm, and not something Envoy can fix. IMO, a stable ABI is all that matters, and Wasm in Envoy has kept the stability promise despite many internal changes inside Envoy. This is no worse than other extension points, e.g. Lua, and it's more flexible since you can also use dynload "nullVM" and not be blocked by runtime performance.

martijneken commented 3 months ago

(adding @PiotrSikora for anything I'm missing)

share what direction this goes and how the evolution looks like

Yes, thank you for the opportunity. I will try to provide some context and direction:

First, I acknowledge there's been (at least) a 1 year gap in ProxyWasm maintenance. The team at Google that was doing most of the work more or less disintegrated, and I've been working to rebuild it. Google is bringing new products to market based on [Proxy]Wasm, so rest assured we will be investing in Wasm extensibility going forward, hopefully in partnership with other stakeholders. Here are the major areas we plan to invest in, approximately in order:

  1. Stabilization. This means writing documentation (in progress), fixing CI (in progress), and updating dependencies (to at least match Envoy). We also are working on some tools and code samples, although these do have a Google Cloud Platform bend. I'm open to 'upstreaming' some of these into ProxyWasm itself, if it would be helpful.

  2. Productionization of Envoy "inline wasm". We've identified a number of shortcomings in the (alpha) Envoy-local wasm extensions, such as: lack of reliable wasm delivery, lack of error handling, lack of automatic scaling, lack of isolation (process and/or sandbox). We are working on a design to address these, which we hope to iterate on with the Envoy community.

  3. ABI evolution. We acknowledge ProxyWasm is a one-off wasm ABI, born years go while wasm was still in its infancy. Per @kyessenov's point above, we are looking to engage with ABI standardization efforts. Specifically, we are interested in evolving to support wasi-http and supplementary ABIs such as wasi-keyvalue.

absolutely zero benefit to Envoy in having the spec separated from Envoy at this point

@mathetake I'm not sure I agree. Per links in the original report, there are integrations with nginx at least. Is that not maintained/used? I will agree that ProxyWasm is essentially an ext_proc based ABI, so maybe you have a point. But I would be wary of breaking existing non-Envoy users.

I do also share the concern that there's version/dependency drift between Envoy and ProxyWasm. ProxyWasm seems to lag Envoy all the time. Anyone have ideas on how to minimize this?

how can I determine the ABI that Envoy is using?

Good question. I see the Envoy dep but that points to a commit and not a release version. Browsing that commit, I would guess 0.2.1. Can someone confirm?

found the overhead of WASM to be a show-stopper

@thenewwazoo Can you elaborate? In our benchmarks wasm performs quite well. We are careful about our choice of wasm engine, and we do precompile plugins ahead of time.

https://github.com/tetratelabs/proxy-wasm-go-sdk

I have serious concerns about the memory management module in TinyGo, operating under wasm. I filed https://github.com/tetratelabs/proxy-wasm-go-sdk/issues/450 just this week, and it seems there's no supported solution. I would only recommend using C++ or Rust at this time.

mathetake commented 3 months ago

I will agree that ProxyWasm is essentially an ext_proc based ABI, so maybe you have a point. But I would be wary of breaking existing non-Envoy users.

@martijneken curious, how in the universe just removing the dependency on one library would break the another existing the users of the library? could you tell me how that works? I am not saying making changes to Proxy-Wasm in the org at all, and if you think in that way, that's not my intention.

mathetake commented 3 months ago

@martijneken what I am saying is just to have a single place of the documentation and implementations which I believe is Envoy official documentation. Looks at this mess https://github.com/proxy-wasm/spec/pull/42, the original intention was to document properly and what happened was it is ignored by one single person who has been constantly saying "i am doing in a few weeks" and disappearing. That's totally and completely disgraceful to users, don't you think? That's the whole blocker of why SO MANY people before have complained about this documentation mess. Just being detached from the throne of one single person there doesn't harm anyone here, but just benefits Envoy community. What's damage doing so?

kyessenov commented 3 months ago

I think we really need both: an upstream ProxyWasm reference documentation that spells out the least common denominator functionality for all implementations - this requires fixing the governance of ProxyWasm, as many have pointed out. I would recommend drawing a stricter boundary for "core" and "experimental" ABI definitions here. A lifecycle and a capabilities model also belong here.

We also need to document the Envoy implementation of ProxyWasm in the "inline wasm filter". There are various extensions and backdoors to access Envoy internals that cannot be captured in ProxyWasm spec, but they are crucial if one wants to use this particular Wasm implementation.

Re: performance - I'd be surprised that a simple task would perform poorly in Wasm unless a poorly chosen language/runtime/library is used. There is a real problem that it's difficult to author "good" Wasm, and there's a misplaced expectation that any code would perform well in Wasm. I'm not sure how to address this - I can't see Rust being a replacement for Lua, for example, for all the intended audience.

mathetake commented 3 months ago

this requires fixing the governance of ProxyWasm, as many have pointed out. I would recommend drawing a stricter boundary for "core" and "experimental" ABI definitions here. A lifecycle and a capabilities model also belong here.

yeah, if this really is possible - I meant fixing the governance there. Not sure if there's such a thing in the first place and if there's anyone interested doing so either after this catastrophic mess

mathetake commented 3 months ago

sorry for saying a lot folks, but I wanted to say the governance of Proxy-Wasm is 100% of the problem here, and all what I think is the best is just nuke the dependency on that and have a complete spec/doc/sdk coupled with Envoy and don't care about other proxies since IMO there's no benefit to anyone in Envoy community (please tell me if anyone in the universe successfully migrated a Wasm binary in production from Envoy to Openresty, that's great if that really happens).

But agree with @kyessenov if the governance is fixable - that should work as well.

All I have expressed here was from my guilt and apologies about me being involved in a few years back. I really am looking forward to better Wasm situation!

thenewwazoo commented 3 months ago

@kyessenov said:

IMO, a stable ABI is all that matters, and Wasm in Envoy has kept the stability promise despite many internal changes inside Envoy.

That's... mostly true, I think. Between 1.25 and 1.30, there were a number of (afaict undocumented) functions added to the WASM VM API that required changes to @mathetake's Rust NullVM playground code. I spent last week trying to bring it up-to-date until I got stuck.

@martijneken said:

Can you elaborate? In our benchmarks wasm performs quite well. We are careful about our choice of wasm engine, and we do precompile plugins ahead of time.

We benchmarked based on the v8 runtime and found per-filter memory overhead on the order of 180 MB and CPU overhead on the order of (iirc) 10%. It was an admittedly naive attempt I wasn't involved in, but the results indicated unsuitability for our use cases.

I don't mean to make this into an either-or choice if it's not one, but I'm coming at this from the perspective of a frustrated would-be user who's excited by an alternative.

PiotrSikora commented 3 months ago

Looks at this mess proxy-wasm/spec#42, [...] That's the whole blocker of why SO MANY people before have complained about this documentation mess.

Yes, I've dropped the ball on the ABI specification (please review that PR when you get a chance, since it should be good to merge once it's approved).

But how exactly is that preventing you and/or others from writing documentation in Envoy and/or SDKs? You could even document or correct the ABI specification, but you've chosen to nuked the whole repository instead of fixing it.

I wanted to say the governance of Proxy-Wasm is 100% of the problem here

Could you elaborate what do you mean here?

The same people that maintain Wasm in Envoy also maintain Proxy-Wasm C++ Host and Proxy-Wasm C++ SDK, so I fail to see how moving the C++ Host and other projects into Envoy codebase would change anything.

what I think is the best is just nuke the dependency on that and have a complete spec/doc/sdk coupled with Envoy

Why does it matter whether it's Envoy or Proxy-Wasm org? What's preventing you from contributing to one but not the other?

Also, Wasm in Envoy was originally developed inside Envoy's codebase, but as far as I recall, Envoy maintainers refused to accept/review such big change, so it was split into various projects inside the Proxy-Wasm org.

don't care about other proxies since IMO there's no benefit to anyone in Envoy community (please tell me if anyone in the universe successfully migrated a Wasm binary in production from Envoy to Openresty, that's great if that really happens).

IMHO, that's quite rude to people who implemented and use Proxy-Wasm in other proxies.

martijneken commented 3 months ago

We benchmarked based on the v8 runtime and found per-filter memory overhead on the order of 180 MB and CPU overhead on the order of (iirc) 10%. It was an admittedly naive attempt I wasn't involved in, but the results indicated unsuitability for our use cases.

Gotcha. The memory use sounds a bit higher than our (non-Envoy) use case, where a subprocess with one v8 isolate takes <100 MiB, and some of that is the wasm memory. I'm not well versed in the Envoy filter (yet -- per above we do plan to focus on it soon), but I wonder how much of the memory is on the 'host' vs 'wasm' side. Have you looked at it with pprof?

If the 10% CPU refers to the plugin execution only, that sounds fair -- wasm can add some CPU overhead compared to native code. I think this highlights the need for robust NullVM implementations so that those who care less about isolation don't have to compromise on performance.

I have been experimenting for the last ~week with building NullVM filters to compile into v1.30 but ran into this problem after some significant struggles bringing everything up-to-date, and have stopped trying.

I hadn't heard about it until today, but I think Rust SDK + NullVM makes a lot of sense! We'd be happy to work on this. Is there a FR in proxy-wasm-cpp-host tracking it, in addition to https://github.com/envoyproxy/envoy/issues/12155? It would need a local NullVM implementation (like this?) and tests to make we don't lose support / compatibility.

keithmattix commented 3 months ago

Just dropping my 2 cents in here as an interested vendor (Microsoft) who plans to invest in Envoy's WASM support in the next couple of months. I've talked with several of the folks still involved in Envoy/proxy-wasm, and I'm hopeful that, over time, we'll be able to stabilize and modernize Envoy's WASM support (whether that's proxy-wasm, some form of WASI support, or both). Perhaps it would be useful to focus on getting consensus/agreement on the following points:

  1. Bolstering the governance of the proxy-wasm org - proxy-wasm/spec#42 looks pretty active (I plan on adding my own comments in the next couple of days); however, there is indeed only a single person currently shown as a member of the organization. More formal governance should aid in helping those like myself who are looking to contribute find the right people to talk to. The C++ host repo has 3 CODEOWNERS; maybe start there? A more defined contributor ladder would help set expectations as well.
  2. Defining a 6 or 12 month roadmap - Sadly, I have to join the chorus of folks who would love to leverage Envoy's WASM support, but cannot due to poor performance. Whether or not these and other barriers to adoption will be addressed in the next ABI version is unclear. Are the perf issues known and just require someone to do the work? Is further investigation needed? All of these are unclear. I would ask the existing maintainers of the proxy-wasm project to produce a roadmap and clearly delineate where help is needed so that interested parties can contribute if desired.
  3. Fostering the proxy-wasm community - This point is a bit of an expansion on point 1. As was pointed out above, multiple proxies (Kong, OpenResty, NGINX, Envoy) depend on proxy-wasm. This thread (now) has at least 3 vendors (Google, Tetrate, Microsoft) who have opinions on the direction of proxy-wasm. Given this, it's interesting o me that I haven't been able to find a Slack channel, community meetings, etc. to get questions answered, design docs approved, etc. It's entirely possible that I just haven't done a great job at looking, but IMO, the success of any OSS project (including) proxy-wasm is incumbent upon streamlined channels of communication. My suggestion to proxy-wasm maintainers would be to create/facilitate these avenues, potentially leaning upon existing projects/foundations like the CNCF or the ByteCodeAlliance.

I welcome feedback on any/all of the above points. There's obviously a ton of history here, and I'm hopeful that focusing on concrete action items will aid in a healthy resolution for everyone involved.

johnlanni commented 3 months ago

tetratelabs/proxy-wasm-go-sdk#450

@martijneken Based on our extensive experience applying TinyGo with WebAssembly in large-scale scenarios, the combination of TinyGo and bdwgc is indeed feasible; @anuraaga might be overly pessimistic. Although memory leaks in bdwgc are indeed possible in 32-bit environments, the likelihood remains low. Our practical experiences – which include developing over 30 plugins with intricate logic that are utilized across diverse user environments – have encountered virtually no issues.

We also have quite a few users who have developed their own wasm plugins based on tinygo+bdwgc, including those with ten-billion-level pv, and they have not encountered any issues.

There has been a single exception, though: scenarios involving substantial handling of random binary data, as discussed here. For gateway scenarios, similar problems might only arise when dealing with compressed or encrypted data.

jcchavezs commented 3 months ago

My 2p: as @mathetake pointed out the main issue with proxy-wasm spec was (and still is) governance more than technical (which has also some challenges but solving them urges better governance). You can find a lot of frustration on https://github.com/proxy-wasm/spec/issues/41 when people said for weeks "it is coming" with no clear goals or direction and frankly with no community involvement.

I think keeping a half backed spec in an isolated repository, disconnected from the reality and implementors is a really bad idea, we all saw that happening with the tracing standards and that took the ecosystem through a few big bangs and you can see the status of SDKs in a project like OpenTelemetry (with the highest focus and the biggest community + all CNCF exposure) in 2024 https://opentelemetry.io/docs/languages/#status-and-releases (the project started in 2018). I support @mathetake's idea of moving this into envoy and work from there as the spec is already envoy centric. Other proxies can still implement on this.

If the spec doesn't move to envoy there should be a good, diverse and flat governance committee and I would really suggest it would involve users. I learnt that old contributors that have no steak on this become gatekeepers and/or usually harm the project. Also Bear in mind leading/maintaining the standard means writing implementations, support, promotion and leading changes in the ecosystem so people must be hands on.

I like @keithmattix suggestions but I would really not focus ONLY on vendors because no offense but I heard the phrase "a vendor interested to invest in the project" many times. If a company is willing to invest, a good way to start is to get involved in the ecosystem, not going directly to the spec. I would discard the BytecodeAlliance alternative as they are also going through big bangs with component model.

PS. I am maintainer of https://github.com/corazawaf/coraza-proxy-wasm/ and also lead the usage of http-wasm in traefik.

keithmattix commented 3 months ago

If a company is willing to invest, a good way to start is to get involved in the ecosystem, not going directly to the spec

Oh of course; I did not mean to imply that our investments would begin limited to the spec. I will say, however, that the spec repo seems to have more directional discussion than anywhere else, so that's why I highlighted that PR specifically.

For clarity, part of the reason I'm interested in the proxy-WASM project is because of Microsoft's existing investments in the WASI space, including the component model, WASI-http, and others. If possible, I believe having proxy-wasm be compatible with these developments would be beneficial but that's a pure implementation detail at this point🙂

marc-barry commented 3 months ago

When I initially posted this question, I had assumptions that have now been cleared up. I see these as gaps in the formalization and documentation of the spec. It's clear that multiple parties, myself included, have a vested interest in the direction of what is currently referred to as "proxy-wasm".

As co-founder and CTO of my company (Qpoint), I am heavily invested in Envoy's WASM extensions. We have also developed other technology leveraging "proxy-wasm" outside of Envoy, making the formalization of the documentation highly important to us.

Given the number of interested parties, I'm volunteering my time and effort to ensure this gets the attention it deserves. I'll start by creating a collaborative document that will attempt to articulate the roadmap, gaps in documentation, and future interests of the individuals and organizations already on this thread. I welcome contributions and feedback from all interested parties.

martijneken commented 3 months ago

I'll start by creating a collaborative document that will attempt to articulate the roadmap

Great, we (Google) would like to contribute. Aggregating the feedback from this discussion, I think it would help to break this into tracks, such as:

We plan to set up a community meeting to gather stakeholders and discuss roadmap/governance/community. A Slack channel is also a great idea. @leonm1 from our team volunteered to organize.

the combination of TinyGo and bdwgc is indeed feasible

@johnlanni That's great to know. Are you using https://github.com/wasilibs/nottinygc or a different integration? Would love to get this SDK supported.

keithmattix commented 3 months ago

Thanks @marc-barry! We at Microsoft would like to contribute as well. The multiple tracks make a lot of sense to me; looking forward to the community meetings and slack channel @martijneken

ramaraochavali commented 3 months ago

Thank you @marc-barry . We at Salesforce use WASM extensions very heavily in our platform. Would like contribute whereever needed. Please share docs that we can review and contribute. Looking forward to slack channel :-)

marc-barry commented 3 months ago

Thanks @martijneken for the offer to host a community meeting and setting up a Slack channel (with @leonm1 organizing). We look forward to chatting with interested parties and really like the idea of the tracks you proposed. I can setup a document which will take these tracks and begin to add some detail which we can use during the community meeting and as a starting point. Once we have the Slack setup we can have a more fluid discussion about the topics.

patricio78 commented 3 months ago

Thanks @marc-barry and @martijneken. I'd like to contribute as well wherever is needed, we heavily rely on prox-wasm at Mulesoft (Salesforce).

PiotrSikora commented 3 months ago

We benchmarked based on the v8 runtime and found per-filter memory overhead on the order of 180 MB and CPU overhead on the order of (iirc) 10%. It was an admittedly naive attempt I wasn't involved in, but the results indicated unsuitability for our use cases.

The 5-10% CPU overhead is what we found as an acceptable trade-off for the benefits offered by Proxy-Wasm (isolation, ABI stability, and portability).

When it comes to the memory overhead, 180 MB seems way too high, but there are 2 things you need to be aware of, so perhaps documenting them both would be a good idea: 1) Each proxy worker thread/process maintains it's own dedicated WasmVM and Wasm memory (stack and heap). 2) Some languages/SDKs have better defaults than others, e.g. Emscripten initially allocates 16 MB, out of which 5 MB was reserved for stack (nowadays, it's 64 KB, but that requires updating Emscripten in C++ SDK). As you can imagine, this grows pretty fast if you start proxy with multiple worker threads/processes (e.g. 8 CPUs * 16 MB is 128 MB in Wasm memory overhead alone if you're using C++ SDK without changing the size of initial memory allocation).

Sadly, I have to join the chorus of folks who would love to leverage Envoy's WASM support, but cannot due to poor performance. Whether or not these and other barriers to adoption will be addressed in the next ABI version is unclear. Are the perf issues known and just require someone to do the work? Is further investigation needed? All of these are unclear.

Do you have any numbers that you can share? The performance will vary from use case to use case (e.g. performing policy check or dispatching calls to external services will have very different performance overhead from someone doing CPU intensive operations transforming HTTP request/response body inside the Proxy-Wasm plugin).

The ABI is already defined in a way that minimizes memory copies, and we support pre-compilation, so if there is any poor CPU performance, then I believe that most of it needs to be addressed by improvements in Wasm engines, and not at the Proxy-Wasm level, but if you have something that suggests otherwise, than that would be helpful.

PiotrSikora commented 3 months ago
  1. Bolstering the governance of the proxy-wasm org - [...] there is indeed only a single person currently shown as a member of the organization. More formal governance should aid in helping those like myself who are looking to contribute find the right people to talk to. The C++ host repo has 3 CODEOWNERS; maybe start there? A more defined contributor ladder would help set expectations as well.

There are 16 people in that org, but most have membership set to "private".

Talking to CODEOWNERS and/or addressing open issues should be a good way to start, especially since contributors vary across Proxy-Wasm projects.

  1. Defining a 6 or 12 month roadmap - [...] I would ask the existing maintainers of the proxy-wasm project to produce a roadmap and clearly delineate where help is needed so that interested parties can contribute if desired.

One of the big issues in Proxy-Wasm is that historically there was always a number of companies that were interested in contributing to the project, but most of the time it didn't lead to anything else than a number of meetings and unfulfilled commitments. I'm definitely guilty of that myself, so I understand that priorities change, but it's hard to orchestrate anything if there is a dozen of people saying that they want to help, but then don't do any work.

There is a number of open issues in Envoy, and simple maintenance tasks like updating C++ dependencies or reviewing Proxy-Wasm PRs, which should be a good way to start contributing to the project.

PiotrSikora commented 3 months ago

I support @mathetake's idea of moving this into envoy and work from there as the spec is already envoy centric. Other proxies can still implement on this.

The spec is definitely not Envoy centric. In fact, the portability across different proxies was an explicit design goal, and multiple proxies already implement it.

Could you please elaborate on how moving the existing code/spec to Envoy would change the status quo? If anything, it would negatively affect the portability, and willingness of other proxies to implement an Envoy-specific project.

jcchavezs commented 3 months ago

I don't see the spec being envoy centric a bad thing. The whole design was based on envoy use cases and not recognizing it is covering the sun with a finger (see e.g. https://github.com/proxy-wasm/spec/pull/1) and you can ask Kong guys about the struggle to make the spec fix in Kong for example. That is the problem of designing a standard with a single implementation in mind (that movie has been watched many times). I think the "portability" promise could work for very simple extensions but not for complex ones e.g. localReply issues we faced which is not a problem of the spec per se as the spec was incomplete at the time but the lack of clarification about certain issues.

As I mentioned, the main issue is governance, if good governance can bloom in the same org good, otherwise better for envoy to have its own spec for wasm plugins and lead by envoy interested parties. Governance should happen with diverse stakeholders in a flat structure and stakeholders should include users.

Again, as I mentioned and you @piotrsikora pointed out, it is not enough to show up and say "I am a vendor and I want to improve the status quo" because that is the same mistake of writing a spec with no perspective from the implementation, let's start with fixing the ecosystem, get hands dirty with the problems, understand the pains, lead some victories and then whoever is still in the fight can sit on the table and discuss standards otherwise this initiative will fail.

ons. 31. juli 2024 kl. 00:12 skrev Piotr Sikora @.***>:

I support @mathetake https://github.com/mathetake's idea of moving this into envoy and work from there as the spec is already envoy centric. Other proxies can still implement on this.

The spec is definitely not Envoy centric. In fact, the portability across different proxies was an explicit design goal, and multiple proxies already implement it.

Could you please elaborate on how moving the existing code/spec to Envoy would change the status quo? If anything, it would negatively affect the portability, and willingness of other proxies to implement an Envoy-specific project.

— Reply to this email directly, view it on GitHub https://github.com/envoyproxy/envoy/issues/35420#issuecomment-2259294902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYAWKUHQNI4AL2EX7SD3ZPAFUHAVCNFSM6AAAAABLNCDDBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJZGI4TIOJQGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

PiotrSikora commented 3 months ago

The whole design was based on envoy use cases and not recognizing it is covering the sun with a finger (see e.g. proxy-wasm/spec#1) and you can ask Kong guys about the struggle to make the spec fix in Kong for example. That is the problem of designing a standard with a single implementation in mind (that movie has been watched many times). I think the "portability" promise could work for very simple extensions but not for complex ones e.g. localReply issues we faced which is not a problem of the spec per se as the spec was incomplete at the time but the lack of clarification about certain issues.

We've created 2 host implementations (ATS & Envoy, albeit those were not independent implementations) and 2 SDKs (C++ & Rust), so what you're saying simply isn't what happened.

I don't know what issues with Kong are you referring to, but I don't recall them filling any issue(s) in Proxy-Wasm about it.

Again, as I mentioned and you @PiotrSikora pointed out, it is not enough to show up and say "I am a vendor and I want to improve the status quo" because that is the same mistake of writing a spec with no perspective from the implementation, let's start with fixing the ecosystem, get hands dirty with the problems, understand the pains, lead some victories and then whoever is still in the fight can sit on the table and discuss standards otherwise this initiative will fail.

+1

keithmattix commented 3 months ago

Again, as I mentioned and you @PiotrSikora pointed out, it is not enough to show up and say "I am a vendor and I want to improve the status quo" because that is the same mistake of writing a spec with no perspective from the implementation, let's start with fixing the ecosystem, get hands dirty with the problems, understand the pains, lead some victories and then whoever is still in the fight can sit on the table and discuss standards otherwise this initiative will fail.

With respect to all of the maintainers who have brought proxy-wasm to where it is today, might I suggest that a contributor experience strategy that forces new contributors (let alone vendors) to pay a "tax" of fixing existing bugs and paying down pre-existing tech debt may not lead to very many contributors that stick around? In most cases, contributors participate in a project in order to address their use-cases. It is very reasonable to prioritize more pressing issues before implementing a host of new features, but contributors need to see a path to getting their use-case addressed. IMO, that's where I've struggled with proxy-wasm in the past. Indeed, I see the open issues, but I also see abandoned PRs that make me wonder if investing my time in chopping wood and carrying water will be rewarded with consideration of my use-case. That being said, I'm willing to try; I just assigned myself #28826 and hope to complete it in the next couple of weeks. My ask to existing maintainers and contributors is to give the benefit of the doubt and extend a hand to excited and willing new contributors who want to make proxy-wasm better. Take their feedback in new versions of the spec; as much as possible, help them onboard and learn the codebases. It appears that everyone in this thread wants proxy-wasm to succeed, and I believe that by working together with interested parties, we can make it happen

PiotrSikora commented 3 months ago

With respect to all of the maintainers who have brought proxy-wasm to where it is today, might I suggest that a contributor experience strategy that forces new contributors (let alone vendors) to pay a "tax" of fixing existing bugs and paying down pre-existing tech debt may not lead to very many contributors that stick around?

I wasn't suggesting that this is the only way to contribute, but a relatively easy way to do so.

PiotrSikora commented 3 months ago

Regarding Slack, historically, some discussion was happening in #envoy-wasm channel in Envoy's workspace, but nowadays it's mostly a community support channel.

I went ahead and created a dedicated Proxy-Wasm workspace on Slack (invite link). Please join!

johnlanni commented 3 months ago

I'll start by creating a collaborative document that will attempt to articulate the roadmap

Great, we (Google) would like to contribute. Aggregating the feedback from this discussion, I think it would help to break this into tracks, such as:

  • Spec / ABI / evolution (core vs experimental, WASI convergence)
  • Base host (ProxyWasmCppHost maintenance, Wasm engine support)
  • Envoy host implementation (getting this out of alpha, performance)
  • SDKs, language support (C++, Rust, Golang, etc)

We plan to set up a community meeting to gather stakeholders and discuss roadmap/governance/community. A Slack channel is also a great idea. @leonm1 from our team volunteered to organize.

It's great to learn about these changes. I'm maintainer of the Higress project. We've been actively working on many enhancements in the realm of Envoy & Proxy-Wasm, such as bolstering the stability of Envoy Wasm filter and implementing automatic recovery after Wasm VM traps, among others. Some of these improvements have already been contributed to the upstream community, and we will strive to contribute PRs for other parts to the Envoy and Proxy-WASM communities as well.

Additionally, I would like to give a shout-out to the WAMR Wasm runtime. We have been collaborating with the WAMR team from the Bytecode Alliance to explore using WAMR as Envoy‘s Wasm runtime, aiming to enhance Wasm performance and observability: https://www.alibabacloud.com/blog/higresss-new-wasm-runtime-greatly-improves-performance_601025

@lum1n0us As the implementer of the WAMR runtime in Proxy-Wasm and an expert in the Wasm field, could you share your thoughts of Proxy-Wasm?

the combination of TinyGo and bdwgc is indeed feasible

@johnlanni That's great to know. Are you using https://github.com/wasilibs/nottinygc or a different integration? Would love to get this SDK supported.

Yes, we have been utilizing nottinygc, but there are minor differences. Within this forked repository, concerning the potential memory leak issues of bdwgc in the wasm32 environment, I believe there are feasible solutions (such as preferentially allocating addresses from higher positions to avoid resemblance with common values; or exploring the possibility of wasm64). We are willing to take over from where the previous project maintainers left off and continue to push forward in this exploration.

Sincerely, we hope to see the Proxy-Wasm ecosystem grow more open and prosperous.

marc-barry commented 3 months ago

Sharing Qpoint's Experiences with Proxy-Wasm under Envoy

At Qpoint, we've created a Google Document detailing our thoughts and findings on using Proxy-Wasm under Envoy. While our experience may not be universal, we hope it provides valuable insights as we work towards a path forward.

We've enabled comments on the document and welcome your feedback, experiences, and suggestions. You can access it here: https://docs.google.com/document/d/1hHJXdZBn-RyknE6wCPPXle5_AGdX3GConB71cdfJZMY/edit?usp=sharing

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

eshepelyuk commented 2 months ago

Excuse me for jumping in, I've read this conversation through and I have a question why http-wasm and http handler abi is not being considered as a replacement to "abandoned" proxy-wasm ? https://http-wasm.io/http-handler-abi/

martijneken commented 2 months ago

why http-wasm and http handler abi is not being considered as a replacement to "abandoned" proxy-wasm ?

I've been holding my tongue but this is the opposite of reality.

AIUI http-wasm.io was developed by a single ex-Tetrate employee, but it is not maintained and I don't know of any users. It has a fancy website, but quoting someone more familiar than me: "It is safe to assume it’s a dead project".

In contrast, vendors like Google and Microsoft are investing in ProxyWasm and building products around it. I hear the concerns about governance and support in this thread. Please give us a chance to address them, as the set of contributors has changed drastically over the past year. The most concrete evidence of this (other than the PRs in ProxyWasm repos) is that we (Google) are starting a project to get the Envoy wasm filter out of alpha.

I also see the Envoy extensibility "fork" being developed under Envoy dynamic_modules. I regret that we couldn't work together to bring the "native extensions" capability to ProxyWasm, and I hope we still can in the future (e.g. DynVM + NullVM for Rust). In the meantime, if a new native API is what Envoy users want, then so be it, to each their own. I wager that it will not provide the resource/fault isolation which we intend to bring to Envoy + ProxyWasm. But for those writing trusted/1P extensions, maybe that's not what you want or need.

eshepelyuk commented 2 months ago

AIUI http-wasm.io was developed by a single ex-Tetrate employee, but it is not maintained and I don't know of any users. It has a fancy website, but quoting someone more familiar than me: "It is safe to assume it’s a dead project".

Traefik uses it starting in their recent version v3. Also in this thread there was a post from @jcchavezs - a maintainer of https://github.com/jcchavezs/coraza-http-wasm-traefik.

spacewander commented 2 months ago

The most concrete evidence of this (other than the PRs in ProxyWasm repos) is that we (Google) are starting a project to get the Envoy wasm filter out of alpha.

Thanks for @martijneken sharing the plan!

I wager that it will not provide the resource/fault isolation which we intend to bring to Envoy + ProxyWasm. But for those writing trusted/1P extensions, maybe that's not what you want or need.

Could you name some situations in which people need to write trusted extensions? Usually, we trust the developer but not the plugin itself. For example, most of the plugins we run are developed by our teammates. So the technology doesn't need to be fully sandboxed - ensuring our teammate is sane (and code review) is enough. Unless we are lending the Envoy cluster to run the customer's plugins... (maybe that is the Google's use case?)

BTW, the Proxy Wasm can not perfectly provide a trust declaration so far (maybe it will be improved in the future). Let's list some risks here:

  1. the plugin contains unsafe syscall operations, for example, reading the other configuration on the disk: handled well in Wasm
  2. the plugin consumes unlimited memory: currently, it seems that proxy wasm doesn't have per-plugin memory limitation. But technically a Wasm runtime can handle this well.
  3. the plugin triggers an infinite loop sometimes: Wasm doesn't have CPU limitation.
  4. the plugin allows untrusted clients to get control (XSS injection, authn/z bypass, and so on): this is usually caused by the plugin logic, not by the way to implement the plugin.

Wasm plugin can handle 1&2, but to get a trusted extension, we still need to have a careful code review to address all the risks.

johnlanni commented 2 months ago

I believe that being trusted has two levels:

  1. Plugin logic guarantees the security of Envoy's operational logic
  2. Plugin logic guarantees the security of the Envoy's operating environment

The former is difficult to ensure at the mechanism level, but the latter can be guaranteed through the Wasm mechanism because prohibiting system calls can ensure the security of the host environment and prevent the creation of logic with high-risk security vulnerabilities.

In addition, in fact, the memory limits for plugins (no more than 1G per VM) have already been implemented in the proxy-wasm-cpp-host project; as for CPU limits, they need to be implemented at the runtime level, for example, WAMR can already measure the CPU execution time for each VM (although there is additional overhead cost), and subsequently, this can be combined with Envoy's overload mechanism to enforce limits.

spacewander commented 2 months ago

in fact, the memory limits for plugins (no more than 1G per VM) have already been implemented in the proxy-wasm-cpp-host project

I am glad to hear that the memory limitation already exists in the Proxy Wasm, which proves the conclusion that Wasm plugin can handle attack vector 1&2.

WAMR can already measure the CPU execution time for each VM (although there is additional overhead cost), and subsequently, this can be combined with Envoy's overload mechanism to enforce limits

So far, can Envoy's overload mechanism turn an infinite loop into a finite one? Even if we can limit the CPU to the level of Envoy, this doesn't mean the Wasm plugin is trusted because such a Wasm plugin can take away other features' CPU resources. A per-plugin CPU limitation is required, and this is not an easy job - as the runtime needs to be able to do CPU schedule itself, not just the measurement.

johnlanni commented 2 months ago

For a infinite loop, it can be detected and the corresponding wasm VM can be destroyed, although this would likely require support at the runtime level. Similarly, for abnormal CPU usage of certain plugins, we can also consider:

  1. For cases exceeding the non-severe threshold, introduce a delay for requests that are to pass through the wasm plugin logic.
  2. Exceeds too much, directly destroy the corresponding wasm vm
marc-barry commented 1 month ago

Since this discussion began I noticed that the situation with documenting proxy-wasm has changed. The following pull requests were merged:

With those you can find the documented spec for the respective versions under https://github.com/proxy-wasm/spec/tree/main/abi-versions.

martijneken commented 1 month ago

Could you name some situations in which people need to write trusted extensions? Usually, we trust the developer but not the plugin itself.

You're right @spacewander, thanks for putting a finer point on this. There is absolutely a difference between a vendor like Google running customers' code and 1P extensions for an Envoy owner. The topics then are security vs. production stability. Vendors need both, so I think our interests are still well aligned.

Paraphrasing your list of risks:

  1. Security boundary. Wasm does provide a security boundary. For vendors it may or may not be sufficient, depending on the runtime and the risk profile. Likely N/A for 1P.
  2. Logical compromise. In this respect plugin logic is the same as any server code -- operating on user facing input/output. If compromised, the other protections may contain the impact, depending on the attack.
  3. Memory limits. Yep, these exist and they are globally configurable, see: https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/main/include/proxy-wasm/limits.h
  4. CPU limits. This doesn't exist today, but this is one of the improvements we want to make soon. Some runtimes have better built-in support than others (e.g. wasm instruction counting), but worst case we can fall back on a watchdog thread that checks CPU time spent by the Envoy/wasm thread. The ideas offered by @johnlanni match ours, with the addition that one could rate-limit VM restarts to prevent abuse.
spacewander commented 1 month ago

@martijneken Thanks for your infomation!