axman6 / amazonka-s3-streaming

Provides a conduit based interface to uploading data to S3 using the Multipart API
MIT License
20 stars 23 forks source link

Rewrite streamUpload #22

Closed domenkozar closed 3 years ago

domenkozar commented 4 years ago

I wanted to rewrite streaming to improve performance of chunking. DList is quite inefficient compared to vectorBuilder. See https://www.fpcomplete.com/blog/2014/07/vectorbuilder-packed-conduit-yielding

A few improvements along the way:

Need testing.

Out of scope: rewriting concurrentUpload in terms of streamUpload

domenkozar commented 4 years ago

cc @axman6

domenkozar commented 4 years ago

Something is wrong, it's leaking memory.

domenkozar commented 4 years ago

@axman6 I've updated the PR to use raw memory buffer that should become part of the conduit api at some point https://github.com/snoyberg/conduit/issues/438

The last blocking thing is amazonka release as it needs some extra monad instances for AWST. See https://github.com/brendanhay/amazonka/issues/574

domenkozar commented 4 years ago

I can confirm with this code I can upload to s3 bucket with ~80MB/s.

domenkozar commented 4 years ago

@axman6 ping?

axman6 commented 4 years ago

Sorry, I hadn't seen the updates here. One of the goals I'd had for the conduit side of things was to avoid copying data if possible, which is why the I was using DLists of bytestrings. I'm not sure that is a great idea though, and the use of a preallocated buffer is nice, but it makes me think that maybe we should just be using byte string builders if we're going to go that route, and be more sure that we're not going to have dangerous memory accesses. I might make a fork of this branch and play with it to see if I can come up with something I'm happier with. Again sorry this has taken so long, life been hectic and maintaining libraries has been far from the front of my mind.

domenkozar commented 3 years ago

@axman6 what would you like to do with this?

domenkozar commented 3 years ago

I'm using this in production for http://cachix.org/.

I understand the code is not ideal, but it improves effectiveness and CPU/memory usage over the current code.

I also understand you want to keep the code tidy and correct.

Closing as I don't see a way forward. That said, thanks a lot for this library!

ghost commented 3 years ago

Hey @domenkozar, would you be interested in taking over this package? I don't have a use for it or time to maintain it (not that I ever did really). If so, I am happy to add/make you the maintainer on Hackage and you can keep it going with releases if you like. It's great to hear it's being used in production for something so useful, and I'm even happier to know it's not using my atrocious code! 😅

domenkozar commented 2 years ago

Hey @axman6 / @axman6-da, I gave this a thought and I'm not sure I'm in a good place to maintain this library as well. Already have too many packages on my plate :(

Maybe you could make a call for maintainers and see if someone steps up?

endgame commented 2 months ago

If the memory leak is in request bodies like the Glacier issue linked upthread, https://github.com/snoyberg/http-client/issues/538 seems pretty suspicious as a cause.