delta-io / kafka-delta-ingest

A highly efficient daemon for streaming data from Kafka into Delta Lake
Apache License 2.0
337 stars 71 forks source link

Getting "unexpected argument" error for all argument IDs when running KDI #144

Closed clairewood closed 10 months ago

clairewood commented 10 months ago

Apologies if this ends up being my own error/misunderstanding, I'm very new here!

I'm working on getting kafka-delta-ingest up and running locally, but when I try following the "Starting Worker Processes" steps in the Using Azure Event Hubs section of the README, I get "unexpected argument" errors.

When I try running it as it appears in the README:

$ RUST_LOG=debug cargo run ingest web_requests ./tests/data/web_requests \
>   --allowed_latency 60 \
>   --app_id web_requests \
>   --transform 'date: substr(meta.producer.timestamp, `0`, `10`)' \
>               'meta.kafka.offset: kafka.offset' \
>               'meta.kafka.partition: kafka.partition' \
>               'meta.kafka.topic: kafka.topic' \
>   --auto_offset_reset earliest
    Finished dev [unoptimized + debuginfo] target(s) in 1.18s
     Running `target\debug\kafka-delta-ingest.exe ingest web_requests ./tests/data/web_requests --allowed_latency 60 --app_id web_requests --transform 'date: substr(meta.producer.timestamp, `0`, `10`)' 'meta.kafka.offset: kafka.offset' 'meta.kafka.partition: kafka.partition' 'meta.kafka.topic: kafka.topic' --auto_offset_reset earliest`
error: unexpected argument '--allowed_latency' found

  tip: to pass '--allowed_latency' as a value, use '-- --allowed_latency'

Usage: kafka-delta-ingest.exe ingest <topic> <table_location>

For more information, try '--help'.
error: process didn't exit successfully: `target\debug\kafka-delta-ingest.exe ingest web_requests ./tests/data/web_requests --allowed_latency 60 --app_id web_requests --transform 'date: substr(meta.producer.timestamp, `0`, `10`)' 'meta.kafka.offset: kafka.offset' 'meta.kafka.partition: kafka.partition' 'meta.kafka.topic: kafka.topic' --auto_offset_reset earliest` (exit code: 2)

When I try running it with a space between the dashes and the argument IDs, I get the same error but with this line changed slightly: error: unexpected argument 'allowed_latency' found When I change the order of the argument IDs, it gives me the error for whichever one is first.

When I use the short argument IDs (like "-k" or "-a"), it stops giving me unexpected argument errors, but gives me a different error instead:

$ RUST_LOG=debug cargo run ingest web_requests ./tests/data/web_requests \
>   -l 60 \
>   -a web_requests \
>   -K "auto.offset.reset=earliest" \
>   -t 'date: substr(meta.producer.timestamp, `0`, `10`)'
    Finished dev [unoptimized + debuginfo] target(s) in 1.23s
     Running `target/debug/kafka-delta-ingest ingest web_requests ./tests/data/web_requests -l 60 -a web_requests -K auto.offset.reset=earliest -t 'date: substr(meta.producer.timestamp, `0`, `10`)'`
thread 'main' panicked at 'Mismatch between definition and access of `APP_ID`. Unknown argument or group id.  Make sure you are using the argument id and not the short or long flags
', src/main.rs:65:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I have also tried following the steps in the README for Using Azure Event Hubs in order to connect my existing Kafka stream (in an Event Hub instance) to a delta table (in ADLS) but I'm getting the same exact errors.

Is anyone else experiencing this issue with the argument IDs? Is the documentation outdated and I should be trying something else? Or did I do something wrong in setup (forking and cloning the repo in VS Code then just following the steps in the readme)?

Any help would be hugely appreciated. Thank you!

mightyshazam commented 10 months ago

It looks like a behavior change that came with upgrading clap versions. I noticed it while adding avro support, but I didn't catch the change in behavior for casing. Basically, clap won't parse things by --<id> with our current version, and when you switch to the short flags, which do work, the code is retrieving them using uppercase versions of the ids.

clairewood commented 10 months ago

Thank you so much for your help and fixing this so quickly! :)

mightyshazam commented 10 months ago

Closing this for #146