felixbuenemann / xlsxtream

Streaming & Fast XLSX Spreadsheet Writer for Ruby
MIT License
216 stars 38 forks source link

Encode empty values #52

Closed sturmer closed 3 years ago

sturmer commented 3 years ago

The original library skips processing of empty cells, but this behavior is not desirable for us. There is an [issue reported][1] about it and a [PR fixing it][2], but the upstream author has not acted on it.

This is the reason we decided to fork the repository and patch it manually. Credit goes to the patch's author for his work.

JRVS-300

felixbuenemann commented 3 years ago

Hey @sturmer, can you describe why a cell with an empty string is preferred over an empty cell for your use case?

sturmer commented 3 years ago

Hi @felixbuenemann, I start suspecting a problem with my spec. Will get back to comment as soon as I have more elements.

sturmer commented 3 years ago

I've had a look at our test. We use Roo to verify our generated spreadsheet, and it expects an empty string rather than an empty cell. I'll try to explain what I understand.

Our test looks something like this:

context 'when in my context' do
    it 'has the data types in cells' do
      temp_file = 'gian-upstream.xlsx'
      header = %w[one two three]
      row1 = [nil, '', 'hello']
      row2 = ['first', '', 'third']
      row3 = ['first', nil, 'third']

      Xlsxtream::Workbook.open(temp_file) do |xlsx|
        xlsx.write_worksheet('Engagement Logs') do |sheet|
          sheet << header
          sheet << row1
          sheet << row2
          sheet << row3
        end
      end

      Roo::Spreadsheet.open(temp_file).sheet(0).each_row_streaming.with_index do |row, index|
        # this function relies on Roo
        verify_xlsx_document(
          row: row,
          index: index
        )
      end
    ensure
      temp_file.close!
    end
  end

The result of that file is a spreadsheet looking like this:

image

With or without the patch, the result doesn't change.

However when we check using Roo, the unpatched library gives e.g. for the second row

[#<Roo::Excelx::Cell::String:0x00007f97df95c8f0 @cell_value="hello", @coordinate=[2, 3], @style=0, @value="hello">]

while the patched version has:

[#<Roo::Excelx::Cell::Empty:0x00007fd1c692e5d8 @coordinate=[2, 1]>,
 #<Roo::Excelx::Cell::Empty:0x00007fd1c692e178 @coordinate=[2, 2]>,
 #<Roo::Excelx::Cell::String:0x00007fd1c692dcf0 @cell_value="hello", @coordinate=[2, 3], @style=0, @value="hello">]

This seems conceptually the same but our customers might be relying on this behavior and we don't feel like risking to break their code.

sturmer commented 3 years ago

FWIW, here's verify_xlsx_columns, really nothing special:

  def verify_expected_columns(row, fields)
    row.each.with_index do |column, column_index|
      column_name = fields[column_index]

      # <here we have some code for special cases>

      expect(column).to be_a(column_type(column_name, question_type_map)).or(eq(Roo::Excelx::Cell::Empty))
    end
  end

  def column_type(column_name, question_type_map)
    # FIELD_TYPE_MAP manually associates names and types
    case described_class::FIELD_TYPE_MAP.merge(question_type_map)[column_name]
    when :integer
      Roo::Excelx::Cell::Number
    when :boolean
      Roo::Excelx::Cell::Boolean
    else
      Roo::Excelx::Cell::String
    end
  end
felixbuenemann commented 3 years ago

So the reason of the change is to introduce overhead to make the test suite happy?

If you use the current version of Excel for Mac (16.48) to create a spreadsheet, it will also not output dummy cells with empty strings, so if your customers depend on that behaviour it would be very strange.

I would suggest to fix your tests to not expect empty dummy cells instead.

sturmer commented 3 years ago

I must say you have a point.