Open fabiomx opened 9 months ago
🤖
Hey @fabiomx, long time no see! Hope you've been coding away without too many bugs. 😉
To ensure that the callback
parameter is still called and the run_results.json
file is uploaded to GCS even when the test
command fails and an Airflow exception is raised in the DbtLocalBaseOperator
in the Cosmos repository, you can modify the exception handling logic in both handle_exception_subprocess
and handle_exception_dbt_runner
methods. After raising the Airflow exception, you should call the callback
function if it's not None
and then handle the uploading of run_results.json
to GCS.
Here's how you can modify the handle_exception_subprocess
and handle_exception_dbt_runner
methods:
def handle_exception_subprocess(self, result: FullOutputSubprocessResult) -> None:
try:
if self.skip_exit_code is not None and result.exit_code == self.skip_exit_code:
raise AirflowSkipException(f"dbt command returned exit code {self.skip_exit_code}. Skipping.")
elif result.exit_code != 0:
raise AirflowException(
f"dbt command failed. The command returned a non-zero exit code {result.exit_code}. Details: ",
*result.full_output,
)
finally:
if self.callback:
self.callback(self.project_dir)
self.upload_run_results_to_gcs(self.project_dir)
def handle_exception_dbt_runner(self, result: dbtRunnerResult) -> None:
try:
if not result.success:
if result.exception:
raise AirflowException(f"dbt invocation did not complete with unhandled error: {result.exception}")
else:
node_names, node_results = extract_dbt_runner_issues(result, ["error", "fail", "runtime error"])
error_message = "\n".join([f"{name}: {result}" for name, result in zip(node_names, node_results)])
raise AirflowException(f"dbt invocation completed with errors: {error_message}")
finally:
if self.callback:
self.callback(self.project_dir)
self.upload_run_results_to_gcs(self.project_dir)
def upload_run_results_to_gcs(self, project_dir: str) -> None:
# Implement the logic to upload run_results.json to GCS
# This is a placeholder function. You need to replace it with actual code to upload to GCS.
pass
In this modification, the finally
block ensures that the callback
function is called and the run_results.json
file is uploaded to GCS regardless of whether an exception was raised due to a failed dbt command. You need to implement the upload_run_results_to_gcs
method with the actual logic to upload the file to GCS, which is not provided here.
This approach allows you to handle the dbt command failure gracefully by ensuring that necessary cleanup or follow-up actions (like calling a callback and uploading files to GCS) are still performed even in the event of a failure.
The current implementation of the callback
parameter in the DbtLocalBaseOperator
is as a parameter that accepts a callable function. This callable function is expected to take a single argument, a string representing the path to the dbt project directory, and return None
. The callback
function is intended to be called after a dbt run completes.
Here is the relevant part of the constructor signature from the DbtLocalBaseOperator
class that shows how the callback
parameter is defined:
def __init__(
self,
profile_config: ProfileConfig,
invocation_mode: InvocationMode | None = None,
install_deps: bool = False,
callback: Callable[[str], None] | None = None,
should_store_compiled_sql: bool = True,
**kwargs: Any,
) -> None:
...
self.callback = callback
...
This implementation detail indicates that the callback
parameter is designed to be flexible, allowing users of the DbtLocalBaseOperator
to specify custom logic to be executed after dbt commands are run, by passing a function that does something with the path to the dbt project directory.
As mentioned in this Slack thread with @jlaneve, I've built a custom local operator, based on the
DbtLocalBaseOperator
, to upload therun_results.json
file to GCS after running thetest
command, using thecallback
parameter.Everything works fine if the tests are ok, but if any test fails, an Airflow exception is raised, and the
callback
is not called (https://github.com/astronomer/astronomer-cosmos/blob/main/cosmos/operators/local.py#L321-L344).At least for the
test
command, I would still need to upload therun_results.json
produced in thetmp_project_dir.
even in the case of failure. Indeed, when the tests fail, it's precisely when the information from therun_results.json
is most critical. Moreover, after the failure, I can't access thetmp_project_dir
anymore, so I haven't been able to use theon_failure_callback
either.