Moto uses a single instance of the ...Response class for all requests for some services

Motivation

In LocalStack, we began seeing some strange behavior, especially with s3 buckets, which were not created in the region or accounts the request targets. This leads to test failures, with rather confusing logs.

For example, a bucket in account 123456789012 in region us-east-1 is created twice. Since the create bucket operation is idempotent in us-east-1, it should prove without an error. However, this fails, if some other s3 operation runs at the same time under a different account.

This test can reproduce this issue (it is a test against LocalStack, but the culprit here is indeed moto):

    def test_parallel_bucket_creation(self, aws_client_factory):
        num_threads = 10
        create_barrier = threading.Barrier(num_threads)
        errored = False

        def _create_bucket(runner: int):
            nonlocal errored
            bucket_name = f"bucket-{short_uid()}"
            s3_client = aws_client_factory(
                region_name="us-east-1", aws_access_key_id=f"{runner:012d}"
            ).s3
            create_barrier.wait()
            try:
                s3_client.create_bucket(Bucket=bucket_name)
                s3_client.create_bucket(Bucket=bucket_name)
            except Exception:
                LOG.exception("Create bucket failed")
                errored = True

        thread_list = []
        for i in range(1, num_threads + 1):
            thread = threading.Thread(target=_create_bucket, args=[i])
            thread.start()
            thread_list.append(thread)

        for thread in thread_list:
            thread.join()

        assert not errored

The problem

After some investigation of this issue, which (before above test) was occurring only occasionally, we found the culprit. For some services (list can be found at the end), moto uses a single ...Response instance for all requests. So, concurrent requests will affect each others state on the setupClass method which is used over all response classes. There are unaffected services, which use the dispatch method, but a rather large part, including S3, is affected.

In S3, for example, these are the problematic lines: https://github.com/getmoto/moto/blob/d3f5e3d68b0517be1172319087ee68c0834a157a/moto/s3/urls.py#L13-L23

They use a single instance, for all requests, which leads to the mentioned behavior.

Impact

The impact is of course only noticeable in parallel scenarios. In our case, we found it out only in terraform tests, which perform a lot of S3 operations for validation, while our lambda provider wanted to create a bucket to store an archive in a different account. Also, the impact is mostly region/account specific, since this is the state most often used from the ...Response classes. However, other state (like headers) are affected too, but the impact is less noticeable.

Please note that we did not test the behavior against moto itself, but from the import path of urls.py, it seems like moto would be affected equally as LocalStack, if we did not miss something.

Proposed fixes

We have a fix in the pipeline against our moto fork (https://github.com/localstack/moto/pull/70) which I am happy to open here as well. However, we wanted to discuss first if this is the route forward. It basically uses the unbound version of a method as a parameter to a function, which calls this method on a new instance then. It was the least intrusive solution we found quickly, and the type checker should catch wrong usages of the dispatch_method class method. We tested this solution with above test, and it works.
We could choose a version were we do not pass the whole method, but just a string with the method name. Would work the same, but without type checking. However, this would be a tad less "hacky".
A refactoring of the found services to use dispatch. This would be more impactful, and we would basically parse the request url twice, which we might want to avoid.

If you have any more ideas for solution, we would be happy to work on them to get something merged here as well, and remove our downstream fix.

Affected Services

These services:

amp
apigateway
apigatewayv2
appconfig
appsync
awslambda
cloudfront
cognitoidp
databrew
ebs
elastictranscoder
glacier
greengrass
guardduty
instance_metadata
iot
managedblockchain
mq
pinpoint
quicksight
route53
s3
s3control
eventbridgescheduler

use a single instance for all the requests. Unaffected services use the dispatch method. There are some services which use the classmethod dispatch on an instance, but since that does not make a difference (even though it can be confusing), we do not necessarily need to change it.

getmoto / moto