dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
838 stars 160 forks source link

"Error: Input for dataset has not been provided", when adding a new table #1743

Closed jcockbain closed 4 months ago

jcockbain commented 4 months ago

I have a dependency problem with a change I'm making to our Dataform pipeline. I'm adding a reference to a new table gcp_costs_excluded. This new table is a static BigQuery table, which we refer to in the pipeline with a declaration config. We use the same pattern for other source tables.

If I refer to this with the following simple intermediate table, it works fine:

SELECT
  day,
  sku,
FROM ${ref("gcp_costs_excluded")}

However, I want to add the following to an existing intermediate table - in order to filter out any rows that match the rows in the exclude table:

LEFT JOIN ${ref("gcp_costs_excluded")} exclude_list
ON simplified.sku = exclude_list.sku
AND TIMESTAMP_TRUNC(simplified.timestamp_start, DAY) = exclude_list.day
WHERE exclude_list.sku IS NULL

When I run the pipeline now I get the following error:

Error: Input for dataset "{"name":"gcp_costs_excluded"}" has not been provided. Provided inputs: {"name":"gcp_combined_billing_export"},{"name":"gcp_resource_categories"}

I have tried other variations of the sql snippet without luck. I have also tried adding the upstream tables as an dependency to the downstream table config. I am using Dataform version 2.9.0.

Any help would be appreciated!

jcockbain commented 4 months ago

Found the issue, we had an assertion on the dataset. This needed an "input" for the new source table.