User has raised that utilizing the --fail-fast flag for job runs in dbt Cloud scheduled runs is causing incredibly noisy logging, making surfacing the error and actual issue difficult.
23 thread concurrency
There are models that are running at the same time
But fail fast says to terminate the run as soon as we run into a single error
The logging is interesting - as we can see that the databricks adapter is going through cancelling the connections, meanwhile with queries that have started are still trying to connect to the server but the connection has been canceled, this error occurs:
: Error during request to server: RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist.
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=0.21970534324645996/900.0, error-message=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., http-code=404, method=GetOperationStatus, no-retry-reason=non-retryable error, original-exception=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., query-id=b'\x01\xefn\x95\xdbi\x14\x0e\xa8\xf1\xd4Ca\x07B\x8d', session-id=None
in addition, apache spark specific logging:
$anonfun$analyzeQuery$1(SparkExecuteStatementOperation.scala:541)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:527)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.analyzeQuery(SparkExecuteStatementOperation.scala:541)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$5(SparkExecuteStatementOperation.scala:633)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:532)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:633)
... 43 more
, operation-id=01ef6e95-cea5-18b1-8077-63b37a785969
from the user - I had assumed that was because we were using multiple threads, but I would expect it to fail nice and gracefully rather than provide a log consisting of 500 identical messages, and sometimes not even providing the original cause of the first model to fail.
User has raised that utilizing the --fail-fast flag for job runs in dbt Cloud scheduled runs is causing incredibly noisy logging, making surfacing the error and actual issue difficult.
in addition, apache spark specific logging:
databricks version: 1.8.5post2+6b29d329ae8a3ce6bc066d032ec3db590160046c dbt version: versionless - 2024.9.239
Expected behavior
from the user - I had assumed that was because we were using multiple threads, but I would expect it to fail nice and gracefully rather than provide a log consisting of 500 identical messages, and sometimes not even providing the original cause of the first model to fail.