Closed namelessjon closed 4 months ago
sorry, will undo the code formatting and repush
There, that's a better diff
I don't understand why the test has failed. It doesn't seem like this change should cause a build to suddenly succeed (which is what I think has caused the test failure?)
@namelessjon the functional test issue it's a random one due to some iceberg parallel testing - I re-trigger it, and it should work fine.
but have a look at the pre-commit checks, there we have a linting issue I guess.
Edit: I run this pre-commit run --files dbt/adapters/athena/impl.py
to make pre-commit happy.
That should now be fixed. Sorry, I'd not setup pre-commit
@namelessjon don't worry, thanks for the quick fix.
I think that what you propose make sense. Given the fact that the function cached has always the same workgroup input, there should be only 1 value in the cache. Before it wasn't the case because for each model we had a different athena client for every call (where each client had a different signature), that was leading to many value in the cache equal to the amount of model runs/tests.
Yeah, exactly. Because this is caching an instance method, we actually end up with what seems to be 1 per thread (as the adapter itself is also part of the cache key) but I can live with that. Or perhaps the caching is per thread anyway with lru_cache
? Whatever the reason for the 4/8 calls rather than the 1 it seems it should be, it's a lot better than one per model.
EDIT: and from my local testing, the memory usage caps out at the same ~400Mb I observed with configuring the decorator like @lru_cache(maxsize=4)
Ah right, the amount of items in the cache it's the number of the threads, so you have a way to indirect controlling it.
Let's merge this one, and consider if really necessary to make the amount of the cache controllable via a parameter if really necessary. Well done.
Thanks for merging
Reading some more, the multiple hits might be as simple as "the 4 threads make 4 simultaneous calls, and so the cache is not yet set"
Yes, adding some debug logging about cache hits confirms that's the case. I see 4 cache misses, but the current size remains at 1 after all the calls stop
Description
Fixes #676
This moves client creation into the cached method, which makes the lru_cache depend only on the work_group name (which it has to depend on, as that drives the output) and the AthenaAdapter instance (via
self
, which is instantiated once per dbt thread it seems) but not the athena client (which gets instantiated once per lookup in the original implementation).Checklist