gchq / sleeper

A cloud-native, serverless, scalable, cheap key-value store
Apache License 2.0
64 stars 10 forks source link

DataFusion compactor command line utility can't process relative paths #3767

Open m09526 opened 4 days ago

m09526 commented 4 days ago

Description

The command line DataFusion compaction utility (not used by operational Sleeper) fails when the input Parquet files are specified using a relative path such as ../../some_file/test.parquet.

Steps to reproduce

  1. Perform a cargo build in the rust directory.
  2. target/release/main /tmp/test.parquet ../some_file/test.parquet -k blah -m blah -a blah

Expected behaviour

Compaction library is called with given files

Screenshots/Logs

0: Object Store error: Object at location /some_file/test.parquet not found: No such file or directory (os error 2)

Background

Relative path is incorrectly converted to a URL causing wrong path to be used.

Command line interface to compactor should canonicalise paths before transforming to URLs.

m09526 commented 4 days ago

Should make the relative path absolute, NOT canonicalise. We don't want to resolve symlinks.