felixbuenemann / xlsxtream

Streaming & Fast XLSX Spreadsheet Writer for Ruby
MIT License
216 stars 38 forks source link

Rails stream downloading xlsx files builded on the fly #14

Closed alexey-yanchenko closed 7 years ago

alexey-yanchenko commented 7 years ago

I have a Rails app. Users have to be able to download large xlsx files. I'm trying to use xlsxtream gem with ActionController::Live.

Here is my controller's action:

def index
 response.headers.delete('Content-Length')                                                                       
 response.headers['Content-Type'] ||= 'application/application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
 response.headers['Content-Disposition'] = "attachment; filename=full.xlsx"                 
 response.headers['X-Accel-Buffering'] = 'no'                                                                    
 response.headers['Cache-Control'] ||= 'no-cache'                                                                

 st = MyIO.new(response.stream)                                                                                  
 begin                                                                                                           
   xlsx = Xlsxtream::Workbook.new(st)                                                                            
   1_000.times do |i|                                                                            
      xlsx.write_worksheet 'Sheet1' do |sheet|                                                                    
        sheet << [i, SecureRandom.hex(8)]                                                                         
      end                                                                                                         
    end                                                                                                           
    xlsx.close                                                                                                    
  ensure                                                                                                          
    st.close                                                                                                      
  end                                                                                                             
end

Here is MyIO implementation:

class MyIO < StringIO                                                                                                   
  def initialize(st)                                                                                                    
    @st = st                                                                                                            
    super()                                                                                                             
  end                                                                                                                   

  def <<(data)                                                                                                          
    @st.write(data)                                                                                                     
  end                                                                                                                   

  def close                                                                                                             
    @st.close                                                                                                           
  end                                                                                                                   
end 

I get an error:

Zip::Error (local header size changed (0 -> 54)):              
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/entry.rb:137:in `verify_local_header_size!'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/entry.rb:274:in `write_local_entry'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/output_stream.rb:161:in `block in update_local_headers'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:38:in `block in each'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:37:in `each'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:37:in `each'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/output_stream.rb:159:in `update_local_headers'
   /usr/local/bundle/gems/rubyzip-1.2.1/lib/zip/output_stream.rb:80:in `close_buffer'
   /usr/local/bundle/gems/xlsxtream-1.1.0/lib/xlsxtream/io/rubyzip.rb:21:in `close'
   /usr/local/bundle/gems/xlsxtream-1.1.0/lib/xlsxtream/workbook.rb:66:in `close'

File can be downloaded but it's corrupted. Is it possible to use this gem this way?

kewogc commented 7 years ago

May be..
xlsx = Xlsxtream::Workbook.new(response.stream)

felixbuenemann commented 7 years ago

While xlsxtream is streaming in the sense that it writes out data as it is generated and never keeps all content in memory, the underlying IO still needs to be seekable, because the XLSX file format uses ZIP, which requires seeking to write the header information.

It is technically possible to write non-standard ZIP files that work around this issues by omitting certain information in the headers, but it is likely that Excel and other tools won't read the files or say that they are corrupt and need to be repaired.

The exact problem with strictly forward seeking is that each file in the ZIP is preceded by a header that contains the size and checksum for the file.

Because we generate data on the fly, we do not know the size or checksum ahead of time, so this info needs to be written by seeking back in the IO after the file is finished.

For now the best way to work around the issue is to write to a Tempfile and then send that file to the client as soon as all the data has been written (xlsx.close takes care of writing the ZIP central directory).

felixbuenemann commented 7 years ago

Note that there's some discussion over at WeTransfer/zip_tricks#21 for implementing an alternate xlsxtream ZIP IO which could do true streaming by writing incomplete ZIP entry headers.

felixbuenemann commented 7 years ago

@alexey-yanchenko Please try out xlsxstream 2.0.0, it switches to zip_tricks fro ZIP compression and supports streaming to non-seekable IO-like objects.

You don't need to inherit from StringIO, the following MyIO should suffice:

class MyIO                                                                                                 
  def initialize(st)                                                                                                    
    @st = st                                                                                                                                                                                                                     
  end                                                                                                                   

  def <<(data)                                                                                                          
    @st.write(data)                                                                                                     
  end                                                                                                                   

  def close                                                                                                             
    @st.close                                                                                                           
  end                                                                                                                   
end 
evserykh commented 7 years ago

Looks great :fire:

alexey-yanchenko commented 7 years ago

@felixbuenemann I tested xlsxstream 2.0.0 and it works as expected. Downloaded file can be normaly opened in MS Excel and LibreOffice. Thank you very much for done work :+1:

felixbuenemann commented 7 years ago

Yes, zip_tricks uses a zip feature to write the crc and size after instead of in front of the file which I did not know about, so it should work fine.

alexanderadam commented 6 years ago

Just to be sure: is it really streaming in the mentioned example?

Wouldn't a

response.headers['Transfer-Encoding'] = 'chunked'

be required as well because it would be generated on the fly but delivered at once anyway?

bceppi commented 6 years ago

Hello! Thank you, this works great :)

How do I change the filename downloading it this way?

evserykh commented 6 years ago

@bceppi try to set the header at your controller like this:

response.headers['Content-Disposition'] = %(attachment; filename="some-file-name.csv")

KevinColemanInc commented 6 years ago

For those that don't want to define a whole new class to alias 1 method, you can do this:

    stream = response.stream
    stream.define_singleton_method(:<<) { |value| write(value) }
KevinColemanInc commented 6 years ago
Read error: #<Zlib::BufError: buffer error>

I am getting this error when I copy and paste OP's top comment. Any ideas what I am doing wrong?

Looooong commented 5 years ago

@KevinColemanInc

*** Exception Zlib::BufError in Rack response body object (buffer error)
[...]
Disconnecting client with error: error parsing app response chunked encoding: unexpected end-of-stream

This error happens when Rack::Deflater is present in the middleware. I can only think (without going deeper into the library) that streaming xlsx file conflicts with this middleware (probably because xlsx itself is a zip file). Maybe unexpected end-of-stream has something to do with this.

I suggest that you should remove the Rack::Deflater middleware and delegate the deflater to the webserver in production like Nginx or Apache.

felixbuenemann commented 5 years ago

@KevinColemanInc @Looooong Have you tried setting response.headers['Content-Encoding'] = 'identity' to keep Rack::Deflater from trying to compress the response?

See: https://github.com/rack/rack/blob/master/lib/rack/deflater.rb#L111

KevinColemanInc commented 5 years ago

@felixbuenemann I am not actively working on a project that uses this gem, so unfortunately I don't have the time to investigate this on my own :-/. This was from so long ago, I think the fix from Looooong worked for me.

felixbuenemann commented 5 years ago

I just checked myself, it works if you set:

response.headers['Cache-Control'] = 'no-transform'

The encoding check in Rack::Deflater is negated, so it checks if Content-Encoding is not identity, so we can't use that.

felixbuenemann commented 5 years ago

Btw. I debugged the problem and I think it's a bug in Rack::Deflater when a part of the response body enumerable returns an empty string:

gzip = ::Zlib::GzipWriter.new(StringIO.new)
=> #<Zlib::GzipWriter:0x00000000092bf670>
gzip.write ""
=> 0
gzip.flush
Zlib::BufError: buffer error

So they should make sure they write at least 1 byte before trying to flush.

The reason the body part might be empty is that Zlib controls the flushing of the compressed data, as it sees fit to get a good balance between compression and memory usage (the default Zlib window size is 32 KB).

felixbuenemann commented 5 years ago

There's an existing issue https://github.com/rack/rack/issues/959 for the problem and I've just proposed a fix in https://github.com/rack/rack/pull/1388.