Snowflake-Labs / django-snowflake

MIT License
59 stars 15 forks source link

Slow performance #80

Closed daniel-albuixech closed 5 months ago

daniel-albuixech commented 7 months ago

Hello all,

Recently I discovered this Snowflake backend for Django so let me thank you the effort because it's something I've been waiting for months.

The reason of this issue is because I've been playing with it and I experience slow performance. I would like to know if others are facing similar issues and if there are any recommended solutions or workarounds.

I did two tests with the following environments:

python                     3.11.3
Django                     4.2.7
django-snowflake           4.2b1

and

python                     3.10.4
Django                     4.2.1
django-snowflake           4.2b1

In both cases I followed the same steps, I created a fresh new conda enviroment and I configured the Django admin-site to use an small table (10 records) where to perform CRUD operations. Every operation takes around 10 seconds. It doesn't matter if I list all the records of the table or if I try to update a single record... it always takes 9s, 10s, 11s approximately.

Another comparison that I did was using snowflake-sqlalchemy package, against the same table, and there I didn't have any performance issue.

I would appreciate any guidance or recommendation.

Thanks in advance

sidmitra commented 7 months ago

@daniel-albuixech

Have you looked at connection keep-alive config? After some profiling i figured out some of the latency issues on our end were trying to get a fresh connection each time.

Here's the config we use. See client_session_keep_alive, CONN_MAX_AGE.

SNOWFLAKE_DB_CONFIG = {
    "ENGINE": "django_snowflake",
    "NAME": "foobar",
    "SCHEMA": "foobar",
    "USER": os.environ["SNOWFLAKE_USERNAME"],
    "PASSWORD": os.environ["SNOWFLAKE_PASSWORD"],
    "ACCOUNT": os.environ["SNOWFLAKE_ACCOUNT"],
    "OPTIONS": {
        "client_session_keep_alive": True,
    },
    "WAREHOUSE": os.environ["SNOWFLAKE_WAREHOUSE"],
    "CONN_MAX_AGE": os.environ["SNOWFLAKE_CONN_MAX_AGE],
}

It looked like the library doesn't support pooling yet.

You should run a python profiler to confirm where it's spending its time on(end to end eg. an API endpoint). This is assuming the query itself on snowflake directly is fast. In our case, we had a couple of seconds of added latency due to a new connection. We're still looking deeper into implementing pooling and submitting a PR upstream

daniel-albuixech commented 7 months ago

Hi @sidmitra, thank you very much for your answer.

Implementing the client_session_keep_alive and CONN_MAX_AGE parameters, the waiting time was reduced to around 3 or 4 seconds after the second click in the Admin-site. I mean, the first click is still around 10s and an update, for example, is around 10 seconds as well.

I will run the python profiler and I'll be back with more precise info ;)

Thank you very much because overall it was an improvement.

timgraham commented 5 months ago

Hi Daniel, please let me know if you have any further profiling data or suggestions. Thanks.