GREsau / localstack-persist

LocalStack Community Edition with support for persisted resources.
https://hub.docker.com/r/gresau/localstack-persist
Apache License 2.0
88 stars 13 forks source link

Container crashes due to using up all available memory allocated to Docker #1

Closed phalasz closed 11 months ago

phalasz commented 11 months ago

I have a fairly large S3 state that needs to be handled and the container keeps using more and more memory without releasing it. I have not noticed this with the stock community version of Localstack.

eg.: On the stock community image when syncing 3.7GB worth of assets to S3 the container happily handles this. With your save solution the container crashes after some time in the middle of the process. I guess it is due to the save handling kicking in after each file is synced to the container? Running the sync multiple times eventually syncs all files and persists correctly.

It also happens over time while just getting assets out of S3 and using the container without saving anything after the initial sync.

What information would you need to investigate this?

GREsau commented 11 months ago

Thanks for the report - what version of localstack-persist are you using? latest?

phalasz commented 11 months ago

Yes, latest.

GREsau commented 11 months ago

Could you try again after running docker pull gresau/localstack-persist? The current latest now includes https://github.com/GREsau/localstack-persist/commit/bad38aa0ef3d8aa1b774c150aa2269dac7fc26ad which should dramatically improve performance for many/large objects in S3. Previously, all S3 objects were persisted within a single JSON file which doesn't scale well when there are GB of objects - but with the latest change, objects are instead stored as individual files in the mounted volume.

phalasz commented 11 months ago

Oh, nice. I'll give that a go and report back.

barasimumatik commented 11 months ago

Previously, all S3 objects were persisted within a single JSON file which doesn't scale well when there are GB of objects - but with the latest change, objects are instead stored as individual files in the mounted volume.

I will try it out as well. I run a lot of tests that create lots of small files and occasionally my nginx server responded with a 504 Gateway timeout (causing tests to fail). When I looked into it it seemed to be related to irregular delays when writing to the JSON-file (I guess the file is locked while writing to it?).

It was possible to work around the issue by increasing some timeout settings in nginx, but it didn't feel like a stable solution (well, the way I do testing is not exactly optimal either but it's better than manual testing at the very least).

barasimumatik commented 11 months ago

@GREsau Reading and writing files seem to work, but it breaks when "creating" directories. I'm using the AWS PHP SDK with the StreamWrapper that enables using regular filesystem operations (create a file, create a directory etc.). I don't know the details of the operation, but I found that when using mkdir it breaks (this worked in the previous version).

My guess is that the AWS SDK emulates this operation by using PutObject with an empty payload (i.e. only the path is used). Given the error message below, it would seem the code doesn't handle empty payloads.

2023-11-21T12:28:02.839  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.PutObject => 500 (InternalError)
2023-11-21T12:28:02.952 ERROR --- [   asgi_gw_0] l.aws.handlers.logging     : exception during call chain: 'NoneType' object has no attribute 'read'
GREsau commented 11 months ago

Thanks @barasimumatik, that's extremely useful!

And now I understand why localstack does this check despite the fact that the function parameter type annotations specify that stream is not None - inaccurate type annotations will be the death of me 😩

Fortunately it's an easy fix, I'll get it sorted later today

barasimumatik commented 11 months ago

@GREsau Wonderful! I'll try it when it's fixed :)

GREsau commented 11 months ago

@barasimumatik latest is now updated with the fix

barasimumatik commented 11 months ago

@GREsau That did the trick! The tests go through just as it did with 3.0.0 (I believe it was).

I can create another issue for this if you wish, but I'll just mention it here again:

The first time I ran the tests they failed with a 504 Gateway Timeout like before (though the next two runs I did just now finished successfully). It looks like it has to do with whatever state is persisted for S3 from time to time (are the files written to disk in a batch?).

Note the timestamps in the log output:

...
localstack-localstack-1  | 2023-11-21T16:31:16.644  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.DeleteObjects => 200
localstack-localstack-1  | 2023-11-21T16:31:24.050  INFO --- [ead-4 (_run)] localstack_persist.state   : Persisting state of service s3...
localstack-localstack-1  | 2023-11-21T16:32:21.855  INFO --- [   asgi_gw_0] localstack.request.aws     : AWS s3.ListObjects => 200
...

It looks like it takes about a minute for the persist function to finish (the tests hang for a while and immediately resume making requests). After about a minute of waiting for a response, the nginx server in front of the PHP-based API gives up and sends a 504 to the test client. This seems to happen because the API is stuck waiting for a response from localstack. Do you think there is anything that can be done about that on your side?

phalasz commented 11 months ago

For my use-case the fix worked wonderfully. Thank you.

GREsau commented 11 months ago

@barasimumatik I'm sure there are optimisations that could be made to the persistence mechanism - could you open a new issue for your problem please? Then if you're able to attach the full container log, it would be very helpful 🙂

GREsau commented 11 months ago

Fix released in localstack-persist 3.0.1