MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

Add `--runs-per-job`, `--max-run-fails-per-job`, and more to `metadata` cmd #2923

Closed wslulciuc closed 1 month ago

wslulciuc commented 1 month ago

This PR adds the following cmd line args to cli.MetadataCommand used to seed the Marquez backend for future functional testing:

Screenshot 2024-10-14 at 6 14 03 PM

CLI Args

--jobs

limits OL jobs up to N (default: 5) -- replaces `--runs`

--runs-per-job

 limits OL run executions per job up to N (default: 10)

--runs-active

limits OL run executions marked as active (='RUNNING') up to N

--max-run-fails-per-job

 maximum OL run fails per job (default: 2)

--min-run-duration

minimum OL run duration (in seconds) per execution (default: 300)

--max-run-duration

maximum OL run duration (in seconds) per execution (default: 900)

--run-start-time

 specifies the OL run start time in UTC ISO ('YYYY-MM-DDTHH:MM:SSZ');
 used for the initial OL run, with subsequent runs starting relative to the
initial start time. (default: 2024-10-15T01:00:11.080828Z)

--run-end-time

specifies the OL run end time in UTC ISO ('YYYY-MM-DDTHH:MM:SSZ');
used for the initial OL run, with subsequent runs ending relative to the
initial end time. (default: 2024-10-15T01:07:25.080828Z)

Example

java -jar marquez.jar metadata \
  --jobs 10 \
  --runs-per-job 5 \
  --max-run-fails-per-job 2
Generating runs '5' per job, each COMPLETE/FAIL run event will have a size of '~33404' (bytes)...
Writing '100' events to: 'metadata.json'

output: metadata.json

Bugs

This PR also fixes the following bugs:

netlify[bot] commented 1 month ago

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
Latest commit 732cd3f814f841eda26eefb10ca36690b3fda0ee
Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/670eaae645c6dd0008e1ca38
codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 6.38298% with 132 lines in your changes missing coverage. Please review.

Project coverage is 81.12%. Comparing base (db4fbfa) to head (732cd3f). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
api/src/main/java/marquez/cli/MetadataCommand.java 6.38% 132 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2923 +/- ## ============================================ - Coverage 82.21% 81.12% -1.10% - Complexity 1504 1505 +1 ============================================ Files 268 268 Lines 7253 7358 +105 Branches 324 330 +6 ============================================ + Hits 5963 5969 +6 - Misses 1129 1228 +99 Partials 161 161 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

phixMe commented 1 month ago

Is there any way to get a job in running status so that we could look at those states. Maybe a flag could be added as, current-job-running?

phixMe commented 1 month ago

It also looks like you need to clean up a CI error on the web project.

wslulciuc commented 1 month ago

Is there any way to get a job in running status so that we could look at those states. Maybe a flag could be added as, current-job-running?

~Do you mean to print out a list of jobs / runIDs to the terminal after running the cmd?~ Hmm you mean to have job runs in a RUNNING state?

phixMe commented 1 month ago

Is there any way to get a job in running status so that we could look at those states. Maybe a flag could be added as, current-job-running?

~Do you mean to print out a list of jobs / runIDs to the terminal after running the cmd?~ Hmm you mean to have job runs in a RUNNING state?

Yes, to let the last run for remain in running state...