Closed xtagon closed 8 years ago
Hi xtagon,
I'm glad to hear that, thanks. The case you describe is exactly the reason it was built for.
Regarding your questions.
1) Is this the correct use of printer? It seems wrong. Each time a new chunk is sent, conn isn't actually returned, it's lost in the anonymous function, if that makes sense. But it works. Is there a more functional/idiomatic way?
That is the way I use it too. I thought that a file-like API (think of
IO.binwrite
) made more sense than a reduce-style API as we are
likely doing some sort of IO.
We've been using it in production like this for a while now and it works. However, being completely honest, I couldn't find any piece of documentation in Phoenix saying to do one way or another. I can look into that for you.
2) DataEntry and FileEntry work, but if I need to read chunks from an HTTP response or a S3 client, how will that work? Will I need to write a new type of Entry module?
There should be no need. DataEntry
is built on top of
Zipflow.Spec.StoreEntry
, which allows you to read the entry and
write in chunks. Look at https://github.com/dgvncsz0f/zipflow/blob/master/lib/zipflow/data_entry.ex#L27-L34
and https://github.com/dgvncsz0f/zipflow/blob/master/lib/zipflow/file_entry.ex#L36-L45
FileEntry
should make things clear but let me know if you need
more help. I would gladly send you an example that uses erlcloud
directly.
Let me know if that helps you and feel free to send more questions.
PS: I'm guessing you noticed this already. The entries date/time information is not actually defined and point to fixed point in the past. It is an easy fix, let me know if you need that feature.
~dsouza
Hi @dgvncsz0f thanks so much for the response.
So would I need to apply the data entry once for each chunk? I see that it takes a data
parameter but not an IO object. Or are you saying I should use FileEntry even though it's not technically a file?
Looking at the FileEntry example in the README, you open the file in a block above where you send the chunk, which makes sense, but I'm trying to wrap my head around how I'd do it in a reduction since I'm not hard-coding a file name. To be precise I'll have a list of structs each having the S3 path, and the name of the file in the zip.
If you wouldn't mind putting together an example with ercloud that would be very nice. Thanks!
Hi @xtagon,
The example using erlcloud [lots of things missing, most notably proper error handling]: https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255
The are two important functions to look at. The function that creates the zip entry [1] and the function that reads data from s3 [2]. The filename on the ziparchive is defined in [3].
[1] https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255#file-yyy_controler-ex-L57-L63 [2] https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255#file-yyy_controler-ex-L31-L55 [3] https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255#file-yyy_controler-ex-L58
Notice this pattern should work with anything that allows you to read data in chunks [eg. files, sockets etc.].
The example uses 1M for the chunk size. You probably don't want to make that too small as s3 charges you per get operation.
To test it, I've created two files foobar
and foobaz
at the root
path with random data sizing 4MB each.
$ sha224sum fooba?
f4c136727c20b1afe05a24fe0968217c0494fff04705a4809cfb4e2f foobar
f71f16809e8cd897255115c010e6affbddf5fb96fa3d5be5aa1b02fd foobaz
$ curl http://localhost:4000/yyy >file.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8192k 0 8192k 0 0 4512k 0 --:--:-- 0:00:01 --:--:-- 4511k
$ unzip -l file.zip
Archive: file.zip
Length Date Time Name
--------- ---------- ----- ----
4194304 1980-00-00 00:00 foobar
4194304 1980-00-00 00:00 foobaz
--------- -------
8388608 2 files
$ unzip -p file.zip foobar | sha224sum
f4c136727c20b1afe05a24fe0968217c0494fff04705a4809cfb4e2f -
$ unzip -p file.zip foobaz | sha224sum
f71f16809e8cd897255115c010e6affbddf5fb96fa3d5be5aa1b02fd -
Let me know if that helps.
~dsouza
Thanks, that will help a ton! I appreciate it.
Hi,
Thank you for building Zipflow! :-)
I was able to get a basic test working with Phoenix Framework, but I'm not clear on how to go any further. My use case is to stream a zip file to the client, chunk by chunk. The contents of the zip will be read from multiple files from Amazon S3 (or any other source really). The idea is to do one file (or chunk of a file) at a time all the way from reading it from the source (S3) to writing it to the client (Phoenix connection).
My first test looks like this:
If you visit the route in a browser, it does stream a zip file with the correct contents. My questions are:
printer
? It seems wrong. Each time a new chunk is sent,conn
isn't actually returned, it's lost in the anonymous function, if that makes sense. But it works. Is there a more functional/idiomatic way?Thank you so much for your help!
Justin