apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.51k stars 748 forks source link

Support arrow-flight WASM #5880

Open pbower opened 3 months ago

pbower commented 3 months ago

Describe the bug One cannot compile a flight-rpc client for a wasm target, and feature flags does not resolve the issue.

This is due to problematics packages socket2 and mio, for which socket2 is present in tonic where it doesn't cancel it out for wasm targets.

To Reproduce

  1. Create an absolute bare bones flight-rpc client e.g. list flights.
  2. Compile it to a wasm target, and the build fails due to socket2 and mio.

Expected behavior Able to run flight-rpc client in rust in the browser (wasm). Particularly given Typescript doesn't support it, so it's one of the only viable options.

Additional context Lots of manual github patching hell and feature flags did not break through this, hence I'm raising. If anyone is able to assist will be a major help. Thanks heaps.

pbower commented 3 months ago

I've had time to reflect on this and think that the ask here is "Can a feature flag please be added to specify that only the FlightRPC Client is built as part of the compilation process". These modified packages look like they may help support getting the Flight RPC client up and running for WASM (and therefore JS) which would be a major Arrow milestone.

tokio_wasi hyper_wasi tonic-web-wasm-client mio_wasi

alamb commented 3 months ago

I've had time to reflect on this and think that the ask here is "Can a feature flag please be added to specify that only the FlightRPC Client is built as part of the compilation process".

I think that sounds like a good idea to me. Thank you for the suggestion.I think the major work item here would be to add a CI test to ensure we didn't break it in the futuer

I won't have time to work on this item but I would be happy to review a pull request. Please just ping me when it is ready for a look

pbower commented 3 months ago

Hi @alamb , thanks for coming back to me on this.

I've reviewed the Arrow-rs flight client code here which leans heavily on [tonic::transport::Channel], and is therefore entwined in the socket2 (non-WASM breaking) dependencies. What are your thoughts on the best way to integrate this ? Would it be to replace it with:

  1. tonic-web-wasm-client - I think this would mean providing a WASM client target in arrow-flight/src/client.rs , from here, so that the client uses that instead of pulling in socket2 transport.

  2. A more extreme version like this Tonic WS Transport. I say extreme, as it uses a Web Socket, and therefore is at odds with typical grpc-Web, but has the advantage that bidirectional streaming is supported, but then would likely need the server to talk to the same protocol, therefore I'd say probably not. I'm also not 100% clear on the benefit compared to a standard WebSocket, but it would be cool to have something that works as a 'drop-in' until full bidirectional streaming is supported, given that's still (~12 months away)[https://github.com/grpc/grpc-web/issues/24] using the natural grpc-web architecture (browser, envoy proxy, grpc-server).

Appreciate your thoughts and expertise. Is there a simpler route?

Thanks

tustvold commented 3 months ago

Has this ever been supported, wondering if this is actually a feature request not a bug?

Kikkon commented 3 months ago

We have encountered the same problem and currently don't have a good solution. If you have any suitable ideas, we welcome you to discuss them with us. 😄

alamb commented 3 months ago

I don't have any particular insights to share here -- I think trying to massage the existing arrow flight client to support WASM if tonic doesn't do so easily would be tough.

What are our thoughts about implementing a separate arrow flight client for WASM? Depending on how that implementation went, we could decide if it belonged in arrow-rs or some other repo 🤔

Kikkon commented 3 months ago

In fact, I think this issue is more of a derivative of the one found at: https://github.com/apache/arrow/issues/17325. Since there is no native JS Flight client, I hope to build a WASM version using the Flight client from arrow-rs (because the support for Rust in WASM seems better).

alamb commented 3 months ago

Implementing flight in Rust (to get Javascript) support is so meta ! I love it

Kikkon commented 3 months ago

@pbower Do you have plans or have you already started this part of the work? If not, I think if I have time, I can proceed with a POC to verify the feasibility of these two solutions.

pbower commented 3 months ago

Hi @Kikkon , I haven't no but that's awesome if you are looking at it. Happy to bounce ideas back and forth around this if you'd like, or otherwise. The one thing I did find is that given tonic web-client and server can work without envoy this seems like a major advantage that might offset some of the complicating (overhead) factors in terms of wasm-compilation. Additionally, given even with ArrowJS compiled protobuf and Python, I get a fair bit of latency even on localhost in establishing the initial connection, I'm quite interested whether Rust to Rust here can help mitigate that. If that were the case, I think there would be a really strong 'performance web pattern' here. Particularly for when the bidirectional streaming stuff rolls into grpc-web in ~12 months' time.

Kikkon commented 3 months ago

@pbower Thank you for your reply! I'll keep you updated here if I make any progress. Currently, it looks like using tonic-web-wasm-client might make the changes more manageable.

Kikkon commented 2 months ago

Hello @pbower @alamb

Recently, due to some work-related matters, progress has been slow. I have completed verify the feasibility of building a POC for an arrow-rs flight wasm client using tonic-web-wasm-client.

Here are the modifications I've made:

  1. Cleaned up dependencies that couldn't wasm build, such as tokio's rt-multi-thread and tonic's transport.
  2. Replaced the channel with tonic-web-wasm-client's client.
  3. Started the flight server in grpc-web mode to support calls from tonic-web-wasm-client.
  4. Added necessary dependencies and configurations for wasm build. (🤣 the related code can be found at https://github.com/Kikkon/arrow-rs/tree/wasm_poc, but much of it is copied for the POC and has poor readability)

Current issues:

  1. I've only tested that the server's Handshake requests reach the Rust-based server correctly, but there seem to be issues with the returned data. I plan to test with SQL Server and other API.
  2. Regarding the relationship between this module and arrow-flight, should it be a separate repository or a module similar to arrow-flight-wasm? It looks like it will use some code from arrow-flight. I'm not very familiar with Rust, I hope @alamb you can provide some suggestions.

Plans after the POC:

  1. Determine where the module should reside and complete the necessary code refactoring.
  2. Wrap the wasm client's API comprehensively.
  3. Test and benchmark (following arrow-flight as a reference).
alamb commented 2 months ago

Regarding the relationship between this module and arrow-flight, should it be a separate repository or a module similar to arrow-flight-wasm? It looks like it will use some code from arrow-flight. I'm not very familiar with Rust, I hope @alamb you can provide some suggestions.

I think it depends on what modifications would be necessary

I think the options are:

  1. Add a feature flag to the existing arrow-flight crate (similar to flight-sql-experimental, see https://crates.io/crates/arrow-flight)
  2. Create a new crate

Again I haven't had a chance to review the code but if it can be made a separate crate, that is likely best as it would keep module boundaries clear and make it easier for users to pick what they wanted).

Assuming the implementation is reasonable I think adding this new crate to the arrow-rs repo would certainly be a good idea

Kikkon commented 2 months ago

@alamb Thank you for your reply. I'm not sure if there will be any dependency issues, but I will work towards option 1.