apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.49k stars 1.02k forks source link

Support compiling remaining DataFusion crates (`datafusion-core`) to WASM #7652

Open alamb opened 9 months ago

alamb commented 9 months ago

Is your feature request related to a problem or challenge?

As shown by @jonmmease in https://github.com/apache/arrow-datafusion/pull/7633, some of the datafusion crates can be compiled to WASM:

datafusion-common
datafusion-expr
datafusion-optimizer
datafusion-physical-expr
datafusion-sql

The difficulty with getting the remaining DataFusion crates compiled to WASM is that they have non-optional dependencies on the parquet crate with its default features enabled. Several of the default parquet crate features require native dependencies that are not compatible with WASM, in particular the lz4 and zstd features. If we can arrange our feature flags to make it possible to depend on parquet with these features disabled, then it should be possible to compile the core datafusion crate to WASM as well.

Describe the solution you'd like

One approach might be to disable the relevant parquet features that could not be compiled as described below.

From https://github.com/apache/arrow-datafusion/pull/7633/files#r1335824930 between @jonmmease and @tustvold

@tustvold do you have any thoughts about finagling the parquet crate's dependencies so it can compile, by default, on wasm? Should we perhaps change datafusion to disable the parquet default features?

 tustvold 

IIRC it is the compression codecs that have issues with WASM, disabling these by default I think would be surprising for users. Further I'm not sure how useful parquet support would be given that only InMemory object_store is supported on WASM, although I may have some time to look into this over the next couple of days

 jonmmease 

Yeah, I don't think we'd want DataFusion's default build to disable the default parquet features, but if we could arrange things so that depending on the datafusion core crate with default-features=false would either remove the parquet dependency all together, or disable the default parquet features, then I think we could get things at least compiling for wasm.

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 9 months ago

A good first step might be to simply make parquet optional in DataFusion -- aka https://github.com/apache/arrow-datafusion/issues/7653

That would allow us to validate and explore what dependencies are blocking wasm compilation

tustvold commented 9 months ago

https://github.com/apache/arrow-rs/pull/4884 makes parquet compile for WASM

alamb commented 8 months ago

Also, https://github.com/apache/arrow-datafusion/pull/7745 make parquet support optional in DataFusion

fudini commented 5 months ago

I managed to compile for wasm, but I encountered a couple of problems:

  1. Stack overflow at SessionContext::new
  2. Use of std::time::Instant - this won't compile and probably needs to be hidden behind cfg https://github.com/apache/arrow-datafusion/compare/main...fudini:arrow-datafusion:wasm

After these changes I was able to create SessionContext and run a simple query