dgvncsz0f / zipflow

stream a zip archive while building it
BSD 3-Clause "New" or "Revised" License
24 stars 5 forks source link

Chunked responses in Phoenix #1

Closed xtagon closed 8 years ago

xtagon commented 8 years ago

Hi,

Thank you for building Zipflow! :-)

I was able to get a basic test working with Phoenix Framework, but I'm not clear on how to go any further. My use case is to stream a zip file to the client, chunk by chunk. The contents of the zip will be read from multiple files from Amazon S3 (or any other source really). The idea is to do one file (or chunk of a file) at a time all the way from reading it from the source (S3) to writing it to the client (Phoenix connection).

My first test looks like this:

defmodule HelloZip.ExampleController do
  use HelloZip.Web, :controller

  def test(conn, _params) do
    conn = conn
    |> put_resp_content_type(Plug.MIME.type("zip"))
    |> put_resp_header("content-disposition", ~s(attachment; filename="test.zip"))
    |> send_chunked(200)

    printer = fn data ->
      {:ok, conn} = chunk(conn, data)
    end

    Zipflow.Stream.init
    |> Zipflow.Stream.entry(Zipflow.DataEntry.encode(printer, "hello.txt", "Hello, World!"))
    |> Zipflow.Stream.entry(Zipflow.DataEntry.encode(printer, "goodbye.txt", "See you later :-)"))
    |> Zipflow.Stream.flush(printer)

    conn
  end
end

If you visit the route in a browser, it does stream a zip file with the correct contents. My questions are:

  1. Is this the correct use of printer? It seems wrong. Each time a new chunk is sent, conn isn't actually returned, it's lost in the anonymous function, if that makes sense. But it works. Is there a more functional/idiomatic way?
  2. DataEntry and FileEntry work, but if I need to read chunks from an HTTP response or a S3 client, how will that work? Will I need to write a new type of Entry module?

Thank you so much for your help!

Justin

dgvncsz0f commented 8 years ago

Hi xtagon,

I'm glad to hear that, thanks. The case you describe is exactly the reason it was built for.

Regarding your questions.

1) Is this the correct use of printer? It seems wrong. Each time a new chunk is sent, conn isn't actually returned, it's lost in the anonymous function, if that makes sense. But it works. Is there a more functional/idiomatic way?

That is the way I use it too. I thought that a file-like API (think of IO.binwrite) made more sense than a reduce-style API as we are likely doing some sort of IO.

We've been using it in production like this for a while now and it works. However, being completely honest, I couldn't find any piece of documentation in Phoenix saying to do one way or another. I can look into that for you.

2) DataEntry and FileEntry work, but if I need to read chunks from an HTTP response or a S3 client, how will that work? Will I need to write a new type of Entry module?

There should be no need. DataEntry is built on top of Zipflow.Spec.StoreEntry, which allows you to read the entry and write in chunks. Look at https://github.com/dgvncsz0f/zipflow/blob/master/lib/zipflow/data_entry.ex#L27-L34 and https://github.com/dgvncsz0f/zipflow/blob/master/lib/zipflow/file_entry.ex#L36-L45

FileEntry should make things clear but let me know if you need more help. I would gladly send you an example that uses erlcloud directly.

Let me know if that helps you and feel free to send more questions.

PS: I'm guessing you noticed this already. The entries date/time information is not actually defined and point to fixed point in the past. It is an easy fix, let me know if you need that feature.

~dsouza

xtagon commented 8 years ago

Hi @dgvncsz0f thanks so much for the response.

So would I need to apply the data entry once for each chunk? I see that it takes a data parameter but not an IO object. Or are you saying I should use FileEntry even though it's not technically a file?

Looking at the FileEntry example in the README, you open the file in a block above where you send the chunk, which makes sense, but I'm trying to wrap my head around how I'd do it in a reduction since I'm not hard-coding a file name. To be precise I'll have a list of structs each having the S3 path, and the name of the file in the zip.

If you wouldn't mind putting together an example with ercloud that would be very nice. Thanks!

dgvncsz0f commented 8 years ago

Hi @xtagon,

The example using erlcloud [lots of things missing, most notably proper error handling]: https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255

The are two important functions to look at. The function that creates the zip entry [1] and the function that reads data from s3 [2]. The filename on the ziparchive is defined in [3].

[1] https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255#file-yyy_controler-ex-L57-L63 [2] https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255#file-yyy_controler-ex-L31-L55 [3] https://gist.github.com/dgvncsz0f/9b42ff130ffecf155359d3ae73bdb255#file-yyy_controler-ex-L58

Notice this pattern should work with anything that allows you to read data in chunks [eg. files, sockets etc.].

The example uses 1M for the chunk size. You probably don't want to make that too small as s3 charges you per get operation.

To test it, I've created two files foobar and foobaz at the root path with random data sizing 4MB each.

$ sha224sum fooba?
f4c136727c20b1afe05a24fe0968217c0494fff04705a4809cfb4e2f  foobar
f71f16809e8cd897255115c010e6affbddf5fb96fa3d5be5aa1b02fd  foobaz
$ curl http://localhost:4000/yyy >file.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 8192k    0 8192k    0     0  4512k      0 --:--:--  0:00:01 --:--:-- 4511k
$ unzip -l file.zip
Archive:  file.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
  4194304  1980-00-00 00:00   foobar
  4194304  1980-00-00 00:00   foobaz
---------                     -------
  8388608                     2 files
$ unzip -p file.zip foobar | sha224sum 
f4c136727c20b1afe05a24fe0968217c0494fff04705a4809cfb4e2f  -
$ unzip -p file.zip foobaz | sha224sum 
f71f16809e8cd897255115c010e6affbddf5fb96fa3d5be5aa1b02fd  -

Let me know if that helps.

~dsouza

xtagon commented 8 years ago

Thanks, that will help a ton! I appreciate it.