Open aaaaahaaaaa opened 11 months ago
I don't see any notable commits in 1.5.13
on initial inspection
Reverting to 1.5.12 resolves the issue.
How exactly did you do this? Can you report the python environments in the two containers (pip list
/ pip freeze
) ? Trying to discern if its possible that the leak is from a dependency that also changed between the two container images.
How exactly did you do this?
We changed the helm chart version. We literally just reverted the Renovate bot commit.
1.5.12
pip list
Package Version
--------------------------- ------------
alembic 1.13.0
amqp 5.2.0
aniso8601 9.0.1
annotated-types 0.6.0
anyio 4.1.0
async-timeout 4.0.3
azure-core 1.29.5
azure-identity 1.15.0
azure-storage-blob 12.19.0
azure-storage-file-datalake 12.14.0
backoff 2.2.1
billiard 4.2.0
boto3 1.33.12
botocore 1.33.12
cachetools 5.3.2
celery 5.3.6
certifi 2023.11.17
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.3.0
coloredlogs 14.0
croniter 2.0.1
cryptography 41.0.7
dagster 1.5.12
dagster-aws 0.21.12
dagster-azure 0.21.12
dagster-celery 0.21.12
dagster-celery-k8s 0.21.12
dagster-gcp 0.21.12
dagster-graphql 1.5.12
dagster-k8s 0.21.12
dagster-pandas 0.21.12
dagster-pipes 1.5.12
dagster-postgres 0.21.12
dagster-webserver 1.5.12
db-dtypes 1.1.1
docstring-parser 0.15
exceptiongroup 1.2.0
flower 2.0.1
fsspec 2023.12.2
google-api-core 2.15.0
google-api-python-client 2.110.0
google-auth 2.25.2
google-auth-httplib2 0.1.1
google-cloud-bigquery 3.13.0
google-cloud-core 2.4.1
google-cloud-storage 2.13.0
google-crc32c 1.5.0
google-resumable-media 2.6.0
googleapis-common-protos 1.62.0
gql 3.4.1
graphene 3.3
graphql-core 3.2.3
graphql-relay 3.2.0
greenlet 3.0.2
grpcio 1.60.0
grpcio-health-checking 1.60.0
grpcio-status 1.60.0
h11 0.14.0
httplib2 0.22.0
httptools 0.6.1
humanfriendly 10.0
humanize 4.9.0
idna 3.6
isodate 0.6.1
Jinja2 3.1.2
jmespath 1.0.1
kombu 5.3.4
kubernetes 28.1.0
Mako 1.3.0
MarkupSafe 2.1.3
msal 1.26.0
msal-extensions 1.1.0
multidict 6.0.4
numpy 1.26.2
oauth2client 4.1.3
oauthlib 3.2.2
packaging 23.2
pandas 2.1.4
pendulum 2.1.2
pip 23.0.1
portalocker 2.8.2
prometheus-client 0.19.0
prompt-toolkit 3.0.41
proto-plus 1.23.0
protobuf 4.25.1
psycopg2-binary 2.9.9
pyarrow 14.0.1
pyasn1 0.5.1
pyasn1-modules 0.3.0
pycparser 2.21
pydantic 2.5.2
pydantic_core 2.14.5
PyJWT 2.8.0
pyparsing 3.1.1
python-dateutil 2.8.2
python-dotenv 1.0.0
pytz 2023.3.post1
pytzdata 2020.1
PyYAML 6.0.1
redis 5.0.1
requests 2.31.0
requests-oauthlib 1.3.1
requests-toolbelt 0.10.1
rsa 4.9
s3transfer 0.8.2
setuptools 65.5.1
six 1.16.0
sniffio 1.3.0
SQLAlchemy 2.0.23
starlette 0.33.0
tabulate 0.9.0
tomli 2.0.1
toposort 1.10
tornado 6.4
tqdm 4.66.1
typing_extensions 4.9.0
tzdata 2023.3
universal-pathlib 0.1.4
uritemplate 4.1.1
urllib3 1.26.18
uvicorn 0.24.0.post1
uvloop 0.19.0
vine 5.1.0
watchdog 3.0.0
watchfiles 0.21.0
wcwidth 0.2.12
websocket-client 1.7.0
websockets 12.0
wheel 0.42.0
yarl 1.9.4
pip freeze
alembic==1.13.0
amqp==5.2.0
aniso8601==9.0.1
annotated-types==0.6.0
anyio==4.1.0
async-timeout==4.0.3
azure-core==1.29.5
azure-identity==1.15.0
azure-storage-blob==12.19.0
azure-storage-file-datalake==12.14.0
backoff==2.2.1
billiard==4.2.0
boto3==1.33.12
botocore==1.33.12
cachetools==5.3.2
celery==5.3.6
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
coloredlogs==14.0
croniter==2.0.1
cryptography==41.0.7
dagster==1.5.12
dagster-aws==0.21.12
dagster-azure==0.21.12
dagster-celery==0.21.12
dagster-celery-k8s==0.21.12
dagster-gcp==0.21.12
dagster-graphql==1.5.12
dagster-k8s==0.21.12
dagster-pandas==0.21.12
dagster-pipes==1.5.12
dagster-postgres==0.21.12
dagster-webserver==1.5.12
db-dtypes==1.1.1
docstring-parser==0.15
exceptiongroup==1.2.0
flower==2.0.1
fsspec==2023.12.2
google-api-core==2.15.0
google-api-python-client==2.110.0
google-auth==2.25.2
google-auth-httplib2==0.1.1
google-cloud-bigquery==3.13.0
google-cloud-core==2.4.1
google-cloud-storage==2.13.0
google-crc32c==1.5.0
google-resumable-media==2.6.0
googleapis-common-protos==1.62.0
gql==3.4.1
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
greenlet==3.0.2
grpcio==1.60.0
grpcio-health-checking==1.60.0
grpcio-status==1.60.0
h11==0.14.0
httplib2==0.22.0
httptools==0.6.1
humanfriendly==10.0
humanize==4.9.0
idna==3.6
isodate==0.6.1
Jinja2==3.1.2
jmespath==1.0.1
kombu==5.3.4
kubernetes==28.1.0
Mako==1.3.0
MarkupSafe==2.1.3
msal==1.26.0
msal-extensions==1.1.0
multidict==6.0.4
numpy==1.26.2
oauth2client==4.1.3
oauthlib==3.2.2
packaging==23.2
pandas==2.1.4
pendulum==2.1.2
portalocker==2.8.2
prometheus-client==0.19.0
prompt-toolkit==3.0.41
proto-plus==1.23.0
protobuf==4.25.1
psycopg2-binary==2.9.9
pyarrow==14.0.1
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==2.5.2
pydantic_core==2.14.5
PyJWT==2.8.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
pytzdata==2020.1
PyYAML==6.0.1
redis==5.0.1
requests==2.31.0
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
rsa==4.9
s3transfer==0.8.2
six==1.16.0
sniffio==1.3.0
SQLAlchemy==2.0.23
starlette==0.33.0
tabulate==0.9.0
tomli==2.0.1
toposort==1.10
tornado==6.4
tqdm==4.66.1
typing_extensions==4.9.0
tzdata==2023.3
universal-pathlib==0.1.4
uritemplate==4.1.1
urllib3==1.26.18
uvicorn==0.24.0.post1
uvloop==0.19.0
vine==5.1.0
watchdog==3.0.0
watchfiles==0.21.0
wcwidth==0.2.12
websocket-client==1.7.0
websockets==12.0
yarl==1.9.4
1.5.13
pip list
Package Version
--------------------------- ------------
alembic 1.13.0
amqp 5.2.0
aniso8601 9.0.1
annotated-types 0.6.0
anyio 4.1.0
async-timeout 4.0.3
azure-core 1.29.5
azure-identity 1.15.0
azure-storage-blob 12.19.0
azure-storage-file-datalake 12.14.0
backoff 2.2.1
billiard 4.2.0
boto3 1.34.0
botocore 1.34.0
cachetools 5.3.2
celery 5.3.6
certifi 2023.11.17
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.3.0
coloredlogs 14.0
croniter 2.0.1
cryptography 41.0.7
dagster 1.5.13
dagster-aws 0.21.13
dagster-azure 0.21.13
dagster-celery 0.21.13
dagster-celery-k8s 0.21.13
dagster-gcp 0.21.13
dagster-graphql 1.5.13
dagster-k8s 0.21.13
dagster-pandas 0.21.13
dagster-pipes 1.5.13
dagster-postgres 0.21.13
dagster-webserver 1.5.13
db-dtypes 1.2.0
docstring-parser 0.15
exceptiongroup 1.2.0
flower 2.0.1
fsspec 2023.12.2
google-api-core 2.15.0
google-api-python-client 2.111.0
google-auth 2.25.2
google-auth-httplib2 0.2.0
google-cloud-bigquery 3.14.1
google-cloud-core 2.4.1
google-cloud-storage 2.14.0
google-crc32c 1.5.0
google-resumable-media 2.7.0
googleapis-common-protos 1.62.0
gql 3.4.1
graphene 3.3
graphql-core 3.2.3
graphql-relay 3.2.0
greenlet 3.0.2
grpcio 1.60.0
grpcio-health-checking 1.60.0
h11 0.14.0
httplib2 0.22.0
httptools 0.6.1
humanfriendly 10.0
humanize 4.9.0
idna 3.6
isodate 0.6.1
Jinja2 3.1.2
jmespath 1.0.1
kombu 5.3.4
kubernetes 28.1.0
Mako 1.3.0
MarkupSafe 2.1.3
msal 1.26.0
msal-extensions 1.1.0
multidict 6.0.4
numpy 1.26.2
oauth2client 4.1.3
oauthlib 3.2.2
packaging 23.2
pandas 2.1.4
pendulum 2.1.2
pip 23.0.1
portalocker 2.8.2
prometheus-client 0.19.0
prompt-toolkit 3.0.43
protobuf 4.25.1
psycopg2-binary 2.9.9
pyarrow 14.0.1
pyasn1 0.5.1
pyasn1-modules 0.3.0
pycparser 2.21
pydantic 2.5.2
pydantic_core 2.14.5
PyJWT 2.8.0
pyparsing 3.1.1
python-dateutil 2.8.2
python-dotenv 1.0.0
pytz 2023.3.post1
pytzdata 2020.1
PyYAML 6.0.1
redis 5.0.1
requests 2.31.0
requests-oauthlib 1.3.1
requests-toolbelt 0.10.1
rsa 4.9
s3transfer 0.9.0
setuptools 65.5.1
six 1.16.0
sniffio 1.3.0
SQLAlchemy 2.0.23
starlette 0.33.0
tabulate 0.9.0
tomli 2.0.1
toposort 1.10
tornado 6.4
tqdm 4.66.1
typing_extensions 4.9.0
tzdata 2023.3
universal-pathlib 0.1.4
uritemplate 4.1.1
urllib3 1.26.18
uvicorn 0.24.0.post1
uvloop 0.19.0
vine 5.1.0
watchdog 3.0.0
watchfiles 0.21.0
wcwidth 0.2.12
websocket-client 1.7.0
websockets 12.0
wheel 0.42.0
yarl 1.9.4
pip freeze
alembic==1.13.0
amqp==5.2.0
aniso8601==9.0.1
annotated-types==0.6.0
anyio==4.1.0
async-timeout==4.0.3
azure-core==1.29.5
azure-identity==1.15.0
azure-storage-blob==12.19.0
azure-storage-file-datalake==12.14.0
backoff==2.2.1
billiard==4.2.0
boto3==1.34.0
botocore==1.34.0
cachetools==5.3.2
celery==5.3.6
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
coloredlogs==14.0
croniter==2.0.1
cryptography==41.0.7
dagster==1.5.13
dagster-aws==0.21.13
dagster-azure==0.21.13
dagster-celery==0.21.13
dagster-celery-k8s==0.21.13
dagster-gcp==0.21.13
dagster-graphql==1.5.13
dagster-k8s==0.21.13
dagster-pandas==0.21.13
dagster-pipes==1.5.13
dagster-postgres==0.21.13
dagster-webserver==1.5.13
db-dtypes==1.2.0
docstring-parser==0.15
exceptiongroup==1.2.0
flower==2.0.1
fsspec==2023.12.2
google-api-core==2.15.0
google-api-python-client==2.111.0
google-auth==2.25.2
google-auth-httplib2==0.2.0
google-cloud-bigquery==3.14.1
google-cloud-core==2.4.1
google-cloud-storage==2.14.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.62.0
gql==3.4.1
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
greenlet==3.0.2
grpcio==1.60.0
grpcio-health-checking==1.60.0
h11==0.14.0
httplib2==0.22.0
httptools==0.6.1
humanfriendly==10.0
humanize==4.9.0
idna==3.6
isodate==0.6.1
Jinja2==3.1.2
jmespath==1.0.1
kombu==5.3.4
kubernetes==28.1.0
Mako==1.3.0
MarkupSafe==2.1.3
msal==1.26.0
msal-extensions==1.1.0
multidict==6.0.4
numpy==1.26.2
oauth2client==4.1.3
oauthlib==3.2.2
packaging==23.2
pandas==2.1.4
pendulum==2.1.2
portalocker==2.8.2
prometheus-client==0.19.0
prompt-toolkit==3.0.43
protobuf==4.25.1
psycopg2-binary==2.9.9
pyarrow==14.0.1
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==2.5.2
pydantic_core==2.14.5
PyJWT==2.8.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
pytzdata==2020.1
PyYAML==6.0.1
redis==5.0.1
requests==2.31.0
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
rsa==4.9
s3transfer==0.9.0
six==1.16.0
sniffio==1.3.0
SQLAlchemy==2.0.23
starlette==0.33.0
tabulate==0.9.0
tomli==2.0.1
toposort==1.10
tornado==6.4
tqdm==4.66.1
typing_extensions==4.9.0
tzdata==2023.3
universal-pathlib==0.1.4
uritemplate==4.1.1
urllib3==1.26.18
uvicorn==0.24.0.post1
uvloop==0.19.0
vine==5.1.0
watchdog==3.0.0
watchfiles==0.21.0
wcwidth==0.2.12
websocket-client==1.7.0
websockets==12.0
yarl==1.9.4
Thanks for following up, not much interesting in the dependency changes.
I spent some time with memray
looking for leaks and have so far not been able to turn anything up.
Do you have anything like automated recurring queries against the webserver?
Do you have anything like automated recurring queries against the webserver?
Well only the readinessProbe
from your chart.
Turns out we actually still observe the same behaviour after rolling back to 1.5.12
. So it's not related to the new version. I'm puzzled now. I'll try to investigate further and close the issue.
I've had luck using this tool to get a memory profile of a running process https://github.com/facebookarchive/memory-analyzer and this https://github.com/kmaork/madbg for interactive poking around at the active process. I believe these both need SYS_PTRACE
capabilities given on the k8s pod spec.
Given its a webserver its also susceptible to the "type 3" leaks described here https://blog.nelhage.com/post/three-kinds-of-leaks/ python allocator arena fragmentation, but the very smooth gradient of your graphs makes me skeptical thats the cause without some sort of recurring large query causing the fragmentation.
@aaaaahaaaaa did you find any reason why memory started growing? We have a similar issue and switching between versions didn't help yet - tried from 1.5.14 to 1.5.12.
The memory increase is quite noticeable, showing up even in daily granularity.
This issue seems to be isolated to the webserver component. Both the daemon and code servers are exhibiting stable memory usage. We are operating these as three separate containers within AWS ECS.
We have only one scheduled job active, no sensors, auto-materialized so far. Assets are loaded from dbt.
@jvyoralek No I didn't find the source of the problem and the issue is still occurring for us as well. Unfortunately I didn't have time to investigate further. I think there's clearly something up with the workload, we're not doing anything special either aside from deploying the helm chart.
@alangenfeld found a memory leak that could be the cause of this, I'll let him comment but here is the PR that attempts to fix it https://github.com/dagster-io/dagster/pull/19298
https://github.com/dagster-io/dagster/pull/19298 is a fix for a problem that manifests as very rapid unbounded memory growth resulting in process termination. I don't believe its related to this slower memory growth.
I appear to have a similar problem after upgrading to 1.6. I run Dagster on AWS ECS using Fargate. Hence I don't believe it is my jobs causing it since the code runs on a separate task. Both the Daemon and Dagit/Web server, services, are slowly creeping up. The drops in the following chart is due to restarts. Before the upgrade to 1.6 on the 11th this problem didn't exist.
@noam-jacobson what version were you upgrading from?
@noam-jacobson what version were you upgrading from?
I was on version 1.5.10
@noam-jacobson We're having the same issue on ECS/Fargate on 1.5.7
We are also having the same issue on 1.6.0, also ECS/Fargate
Same here in our k8s deployment cluster. Any clue?
We think we might? have solved it on our end -- we didn't have a strict retention policy on logs set in our dagster.yml and once we set it to below our memory stopped growing:
retention:
schedule:
purge_after_days: 90 # sets retention policy for schedule ticks of all types
sensor:
purge_after_days:
skipped: 7
failure: 90
success: 365
We think we might? have solved it on our end -- we didn't have a strict retention policy on logs set in our dagster.yml and once we set it to below our memory stopped growing:
retention: schedule: purge_after_days: 90 # sets retention policy for schedule ticks of all types sensor: purge_after_days: skipped: 7 failure: 90 success: 365
How did that impact your memory usage? Technically you'll still retain ticks for up to 365 days, thus you should not see a change in behavior in just a few days. Or did I miss something?
I've applied a similar setting on my deployment as well (way stricter than yours, for testing) and my memory is still going up, same as before.
Same problem here on Open-Shift with nearly same packages (dagster 1.6.5), also PostgreSQL and slim-buster images on both daemon and dagster-webserver (separate pods). Tried with python 3.10, 3.11 and sqlalchemy<2.0 + >2.0, no luck so far, crashes every 3-4 days. Currently trying with python 3.12, dagster 1.6.6 and slim-bookworm, will see more next days...
EDIT: We found out that the following is actually not working. The initial indication might have just been a fluke.
~We were having this issue and I believe that we have found the root cause to be a bug in anyio
which leaked processes. The bug was introduced in 4.1.0
and fixed in 4.3.0
(last week): https://github.com/agronholm/anyio/issues/669~
~Dagster has a dependency on anyio
through the following chain: dagit --> dagster-webserver --> starlette --> anyio
and I believe that this issue started to appear for people whenever they rebuilt their Dagster image during the time that bug was present because a newer but buggy version of anyio
would have been included in their docker image.~
~So, the solution could be to either explicitly require anyio >= 4.3.0
or to wait until people rebuild their docker images and automatically get the bug-fixed version.~
Has anyone had success with the solution recommended by @stasharrofi ?
We have made changes, but it appears that the memory usage is still increasing.
I see anyio 4.3 in log
#12 1.757 Collecting dagster==1.6.6
#12 1.810 Downloading dagster-1.6.6-py3-none-any.whl (1.4 MB)
#12 1.852 āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā 1.4/1.4 MB 36.1 MB/s eta 0:00:00
#12 2.037 Collecting dagster-aws==0.22.6
#12 2.042 Downloading dagster_aws-0.22.6-py3-none-any.whl (109 kB)
#12 2.048 āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā 109.8/109.8 kB 32.6 MB/s eta 0:00:00
#12 2.214 Collecting dagster-postgres==0.22.6
#12 2.219 Downloading dagster_postgres-0.22.6-py3-none-any.whl (20 kB)
#12 2.259 Collecting anyio==4.3.0
#12 2.263 Downloading anyio-4.3.0-py3-none-any.whl (85 kB)
@jvyoralek It hasn't worked for me. Deployed the newest Dagster version 1.6.6 with anyio-4.3.0.
@jvyoralek : No, we found out that it's not working for us either. The initial indication that it was working was probably just a fluke.
Same issue here with an ECS deployment, packages and versions included below
dagster==1.6.10
dagster-graphql==1.6.10
dagster-webserver==1.6.10
dagster-postgres==0.22.10
dagster-docker==0.22.10
My team experienced this issue in an OSS ECS deployment after an upgrade from 1.5.9 -> 1.6.8. It impacted the dagit/webserver and daemon services, but not independent grpc/code location services. It presented as a slow leak that would increase memory utilization over a week or so until hitting critical thresholds / crashing the service, with 1gb memory allocated to services.
We "resolved" the issue in our environments by downgrading and pinning the grpcio python package to 1.57.0.
In incremental tests we downgraded our docker image base to the image version/sha we used for our 1.5.9 deployment, reverted dagster packages from 1.6.8 back to 1.5.9, and updated python from 3.10 -> 3.11. None of these changes resolved the memory leak.
Sharing this context as it supports root cause being related to an unpinned package dependency, and not necessarily an issue with the core dagster packages. It also ruled out interaction with OS libs/OS version causing the leak.
We selected grpcio 1.57.0 because it was the version of the dep that was solved for at the time when we originally deployed 1.5.9. It's possible a more recent version would work as well.
Thank you, @jobicarter, for the effective workaround. We deployed it yesterday, and although it's only been a short time, we're already seeing promising changes.
Tested with these versions:
dagster==1.7.0
dagster-webserver==1.7.0
dagster-graphql==1.7.0
dagster-aws==0.23.0
dagster-postgres==0.23.0
grpcio==1.57.0
I can confirm that downgrading grpcio to 1.57.0 stops the leak.
dagster==1.5.14
dagster-aws==0.21.14
dagster-azure==0.21.14
dagster-celery==0.21.14
dagster-celery-k8s==0.21.14
dagster-gcp==0.21.14
dagster-graphql==1.5.14
dagster-k8s==0.21.14
dagster-pandas==0.21.14
dagster-pipes==1.5.14
dagster-postgres==0.21.14
dagster-webserver==1.5.14
grpcio==1.57.0
grpcio-health-checking==1.57.0
We also did try to upgrade it to 1.62.1, but that didn't seem to work.
Thanks for the solution, I think this could be related to the dagster issue, https://github.com/grpc/grpc/issues/36117
Hi All, Having similar issue with the Dagster Docker deployment to Oracle VM. Unfortunately downgrading grpcio
to 1.57.0 version hasn't resolved the issue. Currently using following setup for the Dagster image.
VM seems to get to OOM state circa every 8hrs now.
We are running into the same issue on our Kubernetes cluster, having installed Dagster via the Helm chart.
Is the solution to downgrade grpcio
for the dagster-webserver
pod? In that case, we should build a custom Dockerfile that changes the dependencies and point to that Dockerfile in the Helm chart right?
I don't understand why Dagster hasn't pinned the grpcio version themselves to prevent this issue from happening, it seems a little strange that they are expecting users to either live with the memory leak, or manually fix the dependencies themselves.
Just to add my 2 cents': running dagster 1.7.16/dbt/dagster-webserver all in one k8s pod.
I admit that it is somewhat inconclusive since some memory increase (but also a kind of garbage collection releasing much of the extra memory at a point) was visible before the last restart while using grpcio 1.57.0. Still, overall it looks way better than with grpcio 1.60.
It seems to be a workaround for now, but with at least two drawbacks (other than using an outdated component at all):
We started noticing memory leaks in certain code locations after upgrading to Dagster 1.8. Could grpcio potentially be contributing to these leaks?
We're still investigating, but Iād like to rule out this possibility.
We're observing the same behavior deploying Dagster via Helm on K8s cluster. Building a custom image and downgrading grpcio
seems like a step back to be honest.
hi there š I'm also experiencing what looks like memory leaks on specifically the webserver and daemon (running dagster 1.7.7 on k8s with helm). Has a solution been found apart from downgrading and pinning grpcio
version ?
Dagster version
1.5.13
What's the issue?
dagster-webserver
1.5.13
seems to have some kind of memory leak. Since we updated to that version, we can observe a steady increase in memory usage over the last couple of weeks.1.5.12
resolves the issue.What did you expect to happen?
No response
How to reproduce?
No response
Deployment type
Dagster Helm chart
Deployment details
No response
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a š! We factor engagement into prioritization.