Closed kgpayne closed 3 years ago
I cannot determine if these make it into the artefacts or not.
I believe they do (at least some of them): DBT_CLOUD_PROJECT_ID, DBT_CLOUD_RUN_ID, DBT_CLOUD_JOB_ID, DBT_CLOUD_RUN_REASON, DBT_CLOUD_RUN_REASON_CATEGORY
are present in both manifest.json
and run_results.json
Edit: DBT_CLOUD_PR_ID
and DBT_CLOUD_GIT_SHA
are only there if it's a PR-triggered run.
I tested the solution you suggested and it seems to be working!
One small thing to note: we need to be careful with quotes. All those env vars should be wrapped in ' '
, so this is the code that worked for me:
begin;
copy into {{ src_dbt_artifacts }} from
(
select
$1 as data,
$1:metadata:generated_at::timestamp_ntz as generated_at,
metadata$filename as path,
regexp_substr(metadata$filename, '([a-z_]+.json)') as artifact_type,
-- dbt cloud env_vars
'{{ env_var('DBT_CLOUD_PROJECT_ID', 'not on cloud') }}' as dbt_cloud_project_id,
'{{ env_var('DBT_CLOUD_JOB_ID', 'not on cloud') }}' as dbt_cloud_job_id,
'{{ env_var('DBT_CLOUD_RUN_ID', 'not on cloud') }}' as dbt_cloud_run_id,
'{{ env_var('DBT_CLOUD_RUN_REASON_CATEGORY', 'not on cloud') }}' as dbt_cloud_run_reason_category,
'{{ env_var('DBT_CLOUD_RUN_REASON', 'not on cloud') }}' as dbt_cloud_run_reason
from @{{ src_dbt_artifacts }}
)
file_format=(type='JSON')
on_error='skip_file';
commit;
Also, the create_artifact_resources
macro has to be updated accordingly:
create table if not exists {{ src_dbt_artifacts }} (
data variant,
generated_at timestamp,
path string,
artifact_type string,
dbt_cloud_project_id string,
dbt_cloud_job_id string,
dbt_cloud_run_id string,
dbt_cloud_run_reason_category string,
dbt_cloud_run_reason string
);
The next step would probably be to update the corresponding models so that they now make use of dbt_cloud_run_id
instead of invocation_id
(where it makes sense). For now, I'll just move those models from the package to our project and modify them there, but it'd be nice to have this fix inside the package, too :)
A few questions have come though on the
dbt
slack workspace about deployment on dbt Cloud relating to artefact uploads. It would be useful to have a definitive method for deployment on dbt Cloud, in addition to the manual upload and CLIrun-operation
options.As I understand it, the key issue revolves around the
invocation_id
generated by dbt and stored inrun-results.json
andmanifest.json
. This is the same for a given command, howeverrun-operation
(used to call the macro to upload artefacts) creates its ownmanifest.json
replacing the one from any previous operation 🤦♂️ This results in dissimilarinvocation_id
s inrun-results.json
andmanifest.json
, breaking functionality that relies on joining the two artefacts.For those calling
run-operation
via the CLI, the--no-write-json
global flag is reported to solve this particular issue. It would also be possible to configure an env_var prefixed withDBT_ENV_CUSTOM_ENV_
that is injected into each artefact as per the docs. These vars would appear instg_dbt__run_results_env_keys
and could be used in other artefacts to correlate artefacts from the same 'run' (this is not implemented indbt_artifacts
but could be if it was of use).However, dbt Cloud supports neither
--no-write-json
nor custom env_vars 🤔 dbt Cloud does however provide env_vars with job and run id's that could potentially be used for this purpose. I cannot determine if these make it into the artefacts or not. If they don't, an alternative approach might be to inject them directly via theupload_artifacts
copy query. Something like:Help, testing and comments on this very welcome 🙏