Open alexknorr opened 3 months ago
Could it be related to browser caching? I had a case where a code location was newly deployed and marked failed in the read-only UI, a reload did not change anything. After clearing the edge cache and reload, the code location failure status was gone and showed the newest image version.
Dagster version
1.7.9
What's the issue?
A code location container (pod) is updated trough rolling (spinning up new and then remove old), dagster-webserver started with --read-only flag in a separate pod gets an LocationStateChangeEventType.LOCATION_UPDATED event and tries to reload, but if the code location is probably not available under the old grpc connection for a short time, it fails and does not recover (does no retries). In that case dagster-webserver has to be restarted manually to recover.
What did you expect to happen?
The webserver to recover from temp. unavailable code locations in read-only mode.
How to reproduce?
No response
Deployment type
Other
Deployment details
Custom k8s deployment on open-shift with dagster-webserver, daemon and code locations in separate pods.
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.