Open walter-robson opened 3 weeks ago
Hi @walter-robson , thank you for opening an issue.
I spoke with some team members internally, and we're still struggling to understand why the behavior would be different during a manually invocation of the asset materialization, and when it is scheduled. Can you confirm that when you say manual materialization you mean by doing so through the UI?
We'd also be curious to know if when you schedule the job, wait for completion, and then query the time
column in the destination table, does the max(time)
match what you would expect for an incremental load in the next run?
The way you've implemented this is quite similar to how we are using dlt in our internal pipelines, see:
Grasping at straws, I would also be curious if you renamed the argument name from time
to something non-conflicting with the built-in time
module, but I don't suspect that is really the issue.
Another thing that we might be able to do to debug things is change the code to set the value to dlt.sources.incremental
in the body of the function, and add some logging so that we can see if a value is being passed to the resource:
def ticket_metric_table(
zendesk_client: ZendeskAPIClient,
time: Optional[dlt.sources.incremental[str]] = None
) -> Iterator[TDataItem]:
# add some logging here to see what `time` is...
if not time:
time = dlt.sources.incremental(
"time",
initial_value=start_date_iso_str,
allow_external_schedulers=False,
last_value_func=max,
row_order="asc", # pages are returned in ascending chronological order
)
What's the issue?
I have several DLT pipelines deployed and orchestrated with dagster. When I manually execute the pipelines by materializing them via Dagster, the start_time for the incremental load is updated properly for subsequent runs. However, when I have these pipelines scheduled, the start_time (time.last_value) is set to whatever the last manual materialization was. Scheduled jobs do not update the start time.
What did you expect to happen?
I expect the start_time for the pipeline to be updated for subsequent runs.
How to reproduce?
Below is an example resource
Here is an example of how the resource is materialized:
Here is an example of of the job definition:
Here is an example of the job schedule:
When looking at the logs, we can observe that the start_date is not updating after subsequent Dagster scheduled runs.
Dagster version
1.8.13
Deployment type
Dagster Helm chart
Deployment details
Deployed on Azure Kubernetes Services using Helm
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization. By submitting this issue, you agree to follow Dagster's Code of Conduct.