PrefectHQ / prefect-aws

Prefect integrations with AWS.
https://PrefectHQ.github.io/prefect-aws/
Apache License 2.0
84 stars 40 forks source link

Handle boto3 clients more efficiently with lru_cache #361

Closed mattiamatrix closed 8 months ago

mattiamatrix commented 9 months ago

Closes #332

Example

Screenshots

Checklist

mattiamatrix commented 9 months ago

Thanks for the contribution @mattiamatrix! This looks like a great optimization, but I'd like to suggest some tweaks:

  1. I think we should move the changes into the get_client method so all the client types can benefit from this optimization.
  2. We should maintain a cache of clients with their parameters as the cache key. In the current implementation, if any attributes are changed on the class, the client returned would not be updated to reflect those changes. By introducing a LRU cache, we can improve performance while maintaining configurability on the fly.

Let me know if you'd like me to go into more detail for either of those points!

Thank you for the review @desertaxle. I made the changes as you described. Let me know what you think now. I had to add a little bit of code to follow pydantic/pydantic#3376.

mattiamatrix commented 8 months ago

Hello @desertaxle @zzstoatzz, is the PR ok now?

zzstoatzz commented 8 months ago

hi @mattiamatrix - Alex is currently out-of-office for a while, so I can take over reviewing this.

Let me take a look here / get caught up and do some QA and I will get back to you on this - appreciate the contribution! 🙂

zzstoatzz commented 8 months ago

hey @mattiamatrix - this looks pretty good to me. Can you add a test for the new functionality, maybe something like this?

from prefect_aws.credentials import _get_client_cached, AwsCredentials, ClientType
from unittest.mock import patch

@patch('prefect_aws.credentials._get_client_cached')
def test_get_client_cached(mock_get_client_cached):
    """
    Test to ensure that _get_client_cached function returns the same instance
    for multiple calls with the same parameters and properly utilizes lru_cache.
    """

    # Create a mock AwsCredentials instance
    aws_credentials_block = AwsCredentials()

    # Call _get_client_cached multiple times with the same parameters
    _get_client_cached(aws_credentials_block, ClientType.S3)
    _get_client_cached(aws_credentials_block, ClientType.S3)

    # Verify that _get_client_cached is called only once due to caching
    mock_get_client_cached.assert_called_once_with(aws_credentials_block, ClientType.S3)

    # Optionally, you can test with different parameters to ensure they are cached separately
    _get_client_cached(aws_credentials_block, ClientType.ECS)
    assert mock_get_client_cached.call_count == 2, "Should be called twice with different parameters"
mattiamatrix commented 8 months ago

Hi @zzstoatzz, thank you for the snippet. I added the code that you shared but the test is failing with error AssertionError: Expected '_get_client_cached' to be called once. Called 0 times. since call_count is always 0.

@zzstoatzz it would be very helpful if you could directly update the code and push the correct test. Thank you.

zzstoatzz commented 8 months ago

sorry @mattiamatrix - i should have clarified that that was pseudocode based on my memory of the mocks / fixtures available

i do not have time to do this now but if you're unable then I can come back to this later

mattiamatrix commented 8 months ago

sorry @mattiamatrix - i should have clarified that that was pseudocode based on my memory of the mocks / fixtures available

i do not have time to do this now but if you're unable then I can come back to this later

Ok, I pushed a version that is using cache_info() instead to get a similar result.

zzstoatzz commented 8 months ago

hi @mattiamatrix - after discussing this internally, I think we should change how we're getting that __hash__

as is:

instead of just the id, the __hash__ should be derived from the hashes of all the values of all the fields on AwsCredentials (or alternatively, we could just use the underlying return value of get_params_override as the cache key instead of the AwsCredentials object)

mattiamatrix commented 8 months ago

Close in favour of https://github.com/PrefectHQ/prefect-aws/pull/369