WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
523 stars 226 forks source link

Seeking participants’ opinions on trusted server UDF language preferences #395

Open peiwenhu opened 1 year ago

peiwenhu commented 1 year ago

Hi, I’d like to seek feedback on Privacy Sandbox’s trusted bidding & scoring signal server implementation from FLEDGE participants.

In the trust model explainer, we described a mechanism to run user-defined functions where the server operator can load private custom code into the server to be invoked dynamically to process each request. We would like to learn more about your thoughts on the language of the UDF.

We have 2 directions to explore:

  1. Providing javascript support in the near future (C++ can be a lower priority in the future)
  2. Providing C++ support that may take much longer to be available, and javascript gets deprioritized.

We are going with (1) right now as we expect the initial UDF implementation to be not sophisticated and as a result the language choice is not consequential. Does this align with your thoughts?

fhoering commented 1 year ago

On Criteo side we are expecting to do tasks like ML inference and buyer request filtering from inside the trusted server. So the choice of language seems pretty important. My understanding of trusted servers in that in theory there is no restriction on the language. So there could be n trusted server templates in different languages as long as they have been reviewed by a trusted party or the community. So even C# or Java/Scala which is what we currently use for our stack could be an option.

What is the requirement to use JS or C/C++ ? Is it to be able to use the existing sandboxing engine https://developers.google.com/code-sandboxing/sandbox2 ? If this one is used there could be Python support at least or some Python bindings. One advantage of JS would be to be able to move client side code easily server side and the other way round but most people have not written any Fledge client side logic yet. So it should not be an argument.

palenica commented 1 year ago

Hi Fabian,

thank you for your comments. To be clear, we have no desire to be dogmatic about what language you get to use to define the UDF. Pragmatically, we need to start somewhere, and the easiest place to start seems to be javascript.

For other languages, I'd envision support via compilation to WASM. For this to work, we need to settle on things like API signatures first -- what is the data going in an out of your UDF. Once we feel good about API definitions, we'll need to work out how to pass data in and out of your wasm code -- and that means dealing with things like ABIs. https://www.webassembly.guide/webassembly-guide/webassembly/wasm-abis I'm guessing we may want to start with support for languages that don't require a heavy runtime, such as C/C++ or Rust.

fhoering commented 1 year ago

Hello @palenica, OK. Thanks for the explanation.

Did you also study some lightweight alternatives to the V8 engine + WASM support ? Maybe one could start with something easier. Then we could still move to a more sophisticated solution. The goal should not be to rebuild Chrome server side and then run JS/WASM code but to take advantage of what the server side brings, for example there is no constraint to build something cross OS, just a Linux server would be enough.

I didn't check what alternatives to google sandboxes are out there. But quickly reading this page I like https://nsjail.dev/. It looks simple just just isolating the some syscalls with seccomp-bpf and networks calls. Obviously I didn't try. But in this case I could run any script I want to.

peiwenhu commented 1 year ago

Hi @fhoering thanks.

Regarding the language awareness:

WASM toolchains for generating standalone WASM are most mature for C, C++ and Rust, so these are the languages we're evaluating. Compiling other languages like Go or Python generates very large binaries, and usually requires an intermediate JS layer. For example, the Python docs state that the generated WASM is about 4MB in size: https://pythondev.readthedocs.io/wasm.html The code of this size may just work with our setup but it's not high priority on our side to evaluate this, and we may add code size limits if it imposes significant challenges to other aspects of the system.

Regarding the alternatives:

We did some study on the tech stack. On the server itself yes we are only building a linux server running in a docker container. For the UDF engine we would still like to invest in something that we know we can reuse to a large degree eventually. There are many aspects to consider and spend effort on (security, privacy, performance, functionality, support, timeline, etc) so we would like to choose one that is most likely to succeed. Once that is set we can make decisions on a smaller scale to start with something easier and move to a more sophisticated solution (e.g., JS -> WASM)

fhoering commented 1 year ago

We did some study on the tech stack.

Can you share this study ? I didn't find more explanation beyond this https://github.com/privacysandbox/fledge-docs/blob/main/key_value_service_trust_model.md#design-principles

My understanding is that there is no hard requirement to run Python in the web browser and therefore no hard requirement to run WASM server side also. It seems to ease some things because the execution engine would be language agnostic but it also has downsides as the stack for it is not mature.

peiwenhu commented 1 year ago

Can you share this study

Sure. We have added a goal to publish such an explainer to our future plan.

My understanding is that there is no hard requirement to run Python in the web browser and therefore no hard requirement to run WASM server side also. It seems to ease some things because the execution engine would be language agnostic but it also has downsides as the stack for it is not mature.

Makes sense. I agree that there's a great need to run Python for the ML use cases, and to some users like you it's more important than some other aspects so the tradeoffs should be made by you instead of us. I'd say please don't worry about the hard requirement around running Python as the WASM language at this point. We may not be able to get to evaluating Python WASM in the near term but it might still work out of the box once we have WASM and I'm sure as we progress we can address more concrete concerns with enough communication like this.

dynamix commented 1 year ago

As the trusted server is going to be used for the Android privacy sandbox as well - I would like to share our opinion (remerge).

It would be great to have WASM from the start on. Our stack is build in Go and we would like to be able to reuse some parts. In addition most of the team is most fluent in Go. Given that several languages compile to WASM by now including Go it would be a great way to support multiple languages from the start on.

So option 1 + WASM support would be great.

edit: Just read the comment about languages that would result in larger WASM binaries due to their runtime. For Go there are ways to reduce the size (i.e. TinyGo).

arun-msx commented 1 year ago

I am curious, regarding the WASM binaries, does the trusted server impose any size limitations on them ? Are there any documented restrictions on the how large the WASM binaries can be that is deployed in the trusted server or is this more of a pragmatic concern that differ on case by case basis ? Still having some baseline might be helpful to evaluation alternate solutions using WASM binaries.