Ne-Lexa / php-zip

PhpZip is a php-library for extended work with ZIP-archives.
MIT License
491 stars 60 forks source link

Add support for true in-memory zip handling #85

Open oppiansteve opened 2 years ago

oppiansteve commented 2 years ago

Description
I'm using php-zip in an AWS Lambda with a zip in S3 using the seekable stream wrapper - which works great!

However, the use of php://temp means the lambda's small storage area fills up quickly and causes problems.

As I'm using a lambda with 10GB of RAM (storage is <500MB) it would be great if instead we could optionally support php://memory instead.

I'm likely to fork and have a go at this tomorrow, but I've no idea which way of implementing it would be acceptable for a PR (if any).

Example

oppiansteve commented 2 years ago

perhaps a better fix is to allow setting of the maxmemory for the temp streams e.g.php://temp/maxmemory:<bytes>

oppiansteve commented 2 years ago

I did try this and it seemed to work in itself, however it showed that the AWS S3 (seeking) Stream Wrapper isn't written very well - and instead of having a page-cache LRU it just keeps the whole file up to where seeked - so useless for my purposes.

Still having the control over php-zip's temp memory cache level is quite nice. (And if I ever get round to rewriting the S3 stream wrapper, then it would be useful).

For reference, my experiment is available here - https://github.com/kaldor/php-zip/commit/0dc0e8ccf7757664dda2f6704ce9124cfb441ef7

oppiansteve commented 2 years ago

I was initially going to use memory, but temp with adjustable memory was more flexible and achieved the same results for me - I'm happy with my php-zip changes (for my purpose), I was just scuppered by the AWS S3 StreamWrapper to have end-to-end in-memory only access of huge zip files in S3.

(I did use php://memory for my output writing to S3 with multipart upload and I think that kept it all in memory - if strings are passed, then its guzzle also uses php://temp so will use the filesystem too for bigger content).

Perhaps, instead of passing a maxmemory I could just pass a stream URI and use that throughout php-zip - which means I could pass in a URI with my own custom protocol and have more control. However, maxmemory did what I needed for the experiment.