apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.04k stars 1.14k forks source link

add aarch64 linux version datafusion-cli #9505

Open l1t1 opened 7 months ago

l1t1 commented 7 months ago

Is your feature request related to a problem or challenge?

when I run pip install, it download the source code, but failed tow compile as my host missing some tools

Describe the solution you'd like

pip install the binary verion

Describe alternatives you've considered

No response

Additional context

pip install datafusion-cli
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting datafusion-cli
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ba/02/2bb067f1c5bb4c16852767fe4b9cd7744d4b7cbc849d3e61aa9fe0398a86/datafusion_cli-36.0.0.tar.gz (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 3.9 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]

      Cargo, the Rust package manager, is not installed or is not on PATH.
      This package requires Rust and Cargo to compile extensions. Install it through
      the system's package manager or via https://rustup.rs/

      Checking for Rust toolchain....
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Omega359 commented 6 months ago

Are datafusion-cli binaries distributed or is it always built from source? My understanding was it was built from source only.

I'd recommend just installing rust by following the directions in the error message. Otherwise if you have docker installed on the aarch64 machine you can build the cli that way. Otherwise you can build the cli on another machine with docker using something like this.

Dockerfile:

FROM --platform=$TARGETPLATFORM rust:1.72-bullseye as builder

COPY . /usr/src/arrow-datafusion
COPY ./datafusion /usr/src/arrow-datafusion/datafusion

COPY ./datafusion-cli /usr/src/arrow-datafusion/datafusion-cli

WORKDIR /usr/src/arrow-datafusion/datafusion-cli

RUN rustup component add rustfmt

RUN cargo build --release

FROM --platform=$TARGETPLATFORM debian:bullseye-slim

COPY --from=builder /usr/src/arrow-datafusion/datafusion-cli/target/release/datafusion-cli /usr/local/bin

ENTRYPOINT ["datafusion-cli"]

CMD ["--data-path", "/data"]
docker buildx create --use
docker buildx build ./ -f ./datafusion-cli/Dockerfile --platform=linux/arm64

The above changes to the Dockerfile and the commands I ran seemed to build both binaries. This may not be ideal though - the build process is horribly slow (~5000s locally) and the image I think may run under emulation which for the CLI may just be fine.

l1t1 commented 6 months ago

thanks @Omega359