Closed chinwobble closed 3 years ago
Hi @chinwobble .
There are 3 behaviors when --trace
is enabled:
--existing-runs [pass (default), wait|cancel] Strategy to handle existing active job runs.
the pass
option (default one) will create a new run and trace its status. The wait
one will wait for a current run and launch only afterward and the cancel
one will cancel the current run and start a new one.
In your case, it seems that you're using the default pass
option, and in your job config you either have specified the max_concurrent_runs
to 1, or you haven't specified it at all, and it uses the default setting (also 1).
To fix this issue, simply increase the allowed amount of concurrent runs in the job conf:
"max_concurrent_runs ": 10
However, I agree that a more flexible wait/retry schedule shall be provided in the launch command.
Thanks for the detailed reply. As you described we are using the following flags:
dbx launch --job=my_job --trace --existing-runs pass
Currently the sequence of events is this:
dbx launch --job=my_job --trace --existing-runs pass
to create a new jobdbx launch
with --trace
will continuously report that the job has been skipped.During step 4, there is nothing to trace since no amount of waiting will change the status of that job run.
Its at the end the state machine and the dbx
utility to exit immediately when it sees the skipped
job run status.
Thanks for a detailed explanation, now I got the problem.
Seems like we need to add some additional checks for --trace
behavior
Hi @chinwobble , please use the dbx package from the latest release.
We are trying the azure devops flavour of the ci cd pipeline.
when we get to this stage to run integration tests it hangs for over an hour.
Actual behaviour: We get the below error message repeated for over an hour until the azure pipeline times out (default is 1 hour)
Expected behaviour: If there is concurrency run of the integration test job then the
dbx launch --trace
should either fail immediately or have some configurable retry window (retry ever minute for 5 mins) otherwise exit with an error.