brendanhay / amazonka

A comprehensive Amazon Web Services SDK for Haskell.
https://amazonka.brendanhay.nz
Other
605 stars 227 forks source link

Support question: Performance sending stuff to an S3 bucket? #838

Closed jonathanmoregard closed 1 year ago

jonathanmoregard commented 1 year ago

Hi!

We are using amazonka in production, and have run into some performance issues in conjunction with using the lib. We are doing a batch operation, where we send a lot of different messages to an s3 bucket.

I think we might go about it in an inefficient way, but I want to check before I start making assumptions.

This is our effect handler like it looks right now:

  S3PutObject amazonkaEnv bucketName objectKey (Amazonka.toBody -> body) contentType -> do
    let putObject = Lens.set S3.PutObject.putObject_contentType contentType $ S3.newPutObject bucketName objectKey body
    Amazonka.runResourceT (Amazonka.sendEither amazonkaEnv putObject)

We are traversing over a list to send all the stuff.

I suspect it would be better to run the Amazonka.runResourceT once for the entire list, rather than once par element. Is this the correct way to go about things, or is there a preferred way of structuring things that I'm missing?

jonathanmoregard commented 1 year ago

Context: We're using the rc version of amazonka (commit: 0ccede621e56fb6f240e4850e205cde82d0e4a4b)

We're running our service in kubernetes, hosted in AWS.

endgame commented 1 year ago

Yes, I would aim to runResourceT once over the whole thing as a first step. What effect library are you using? effectful, for example, allows one to introduce a Resource effect, which means you aren't stuck entering and leaving the same ResourceT. If I was doing this with effectful, I would implement a "run S3 on AWS" function something like runS3Real :: (Reader Amazonka.Env :> es, Resource :> es) => Eff (S3 ': es) a -> Eff es a. Then you can discharge the Resource :> es constraint once at the end, which should help somewhat.

I would also consider parallelising your operations if possible, using something like pooledMapConcurrentlyN from unliftio.

I'll close this now to keep the open issue count manageable but please reopen and reply if you have further queries.