Dagster-daemon stops spawning "dagster api grpc" processes

osgrwade commented 1 month ago

Dagster version

1.7.14

What's the issue?

When I run locally I am running two processes: Dagster-webserver < with various commands > Dagster-daemon run 1> daemon.out 2> daemon2.out

Once my local UI is running I start monitoring processes on my machine (where I have 10 code locations loaded into Dagster) I see that: Dagster-webserver spawns off 10 processes that start like this: “..python -m dagster api graph —lazy-load-user-code —socket < a temp file path > —heartbeat —heartbeat-timeout 45 …. “ Dagster-daemon spawns off 10 processes that start like this: “..python -m dagster api graph —lazy-load-user-code —socket < a temp file path > —heartbeat —heartbeat-timeout 20 …. “

At some point (I have seen this happen at 18 minutes and also at 3 minutes) the dagster-daemon process stops spawning off processes.

The contents of the daemen.out and daemon2.out files are: Daemon.out < timestamp > dagster.daemon …. .Instance is configured with … ..’SensorDaemon’] Daemon2.out < nothing appears in the file >

What did you expect to happen?

dagster-daemon to continue spawning processes normally

How to reproduce?

When I run with one code location I do not see any problems. When I run anywhere from 5-10 code locations I see the behavior described above.

Deployment type

Local

Deployment details

We are seeing this in all our environments, but I am able to reproduce this when I run locally as well.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

osgrwade commented 1 month ago

Just as a side note -- we are experiencing this same behavior in ALL of our environments, even production.

osgrwade commented 1 month ago

(I put this on slack, too). Upped to Python 3.11, swapped out Polars 0.19.5 for Polars-lts-cpu. Still getting a hung dagster-daemon process. I can repeat this over-and-over.

dagster-io / dagster