box / spout

Read and write spreadsheet files (CSV, XLSX and ODS), in a fast and scalable way
http://opensource.box.com/spout/
Apache License 2.0
4.23k stars 636 forks source link

Automatically calculate columns width / rows height #129

Open adrilo opened 9 years ago

adrilo commented 9 years ago

Instead of using the default width/height, Spout could calculate the optimal width/height for each cell, based on its content.

This involves using a graphics library, render the text content with the chosen font and get the width/height of the rendered text.

This calculation may be slow so it should be optional. It can also be used when using "wrap text" as wrapping does not work well without a custom row height in LibreOffice.

Spout should expose an option to automatically set the optimal width/height but should also allow developers to specify a max width / height. When the max width is reached, the content will automatically wrap and the cell height will increase. It will increase until it reaches the max height.

zendre4 commented 8 years ago

+1

linaspasv commented 8 years ago

+1

ttkulm commented 8 years ago

+1 Would be a really good feature!

arisdario commented 8 years ago

+10

jordanade commented 8 years ago

I don't need auto-fit, just a way to set widths for each column! Any way to workaround/hack this for now?

jordanade commented 8 years ago

Oh, figured it out--just added this line to Worksheet.php, startSheet(), between the header and the data: fwrite($this->sheetFilePointer, '<cols><col min="1" max="1" width="100" customWidth="1"/><col min="2" max="6" width="30" customWidth="1"/></cols>');

adrilo commented 8 years ago

@jordanade You can also look at #225 if you need to hack something that works. I'm not a big fan of this as it creates an API that is specific to XLSX. Also, if Spout could automatically calculate the width of each column, it would solve 95% of the problems. So I'd rather have this transparent solution in place, instead of having to manually specify the width of each column.

jordanade commented 8 years ago

Surely there is room for APIs that may not work for all possible file outputs? Best effort, etc...

adrilo commented 8 years ago

I'm actually trying to avoid this as much as possible. Although I'll definitely reconsider after implementing the automated method.

nimmneun commented 8 years ago

Hi there, have you been able to come up with a solution for the autosizing of column widths and row heights already?

If not, I've come with an intermediary solution, which - while less accurate than imagettfbbox - is IMHO a decent tradeoff between performance and accuracy. On average it increases the runtime by about 50%-100% and the widths are on average about 1 character off (usually too wide).

It's a bit hacky though (requires serious refinement! =) and involves a csv that holds base sizes for each font and character in the ASCII range, which is why I'm hesitating to fork->code->pull spout, since I dont know if it's a viable solution to integrate that into spout.

adrilo commented 8 years ago

@nimmneun Nobody started working on this enhancement. If you already built something, feel free to share it. Either share a link to your commit or create a pull request. It's always good to get inspiration from others :)

nimmneun commented 8 years ago

@adrilo ok cool =) Since I'm not a fan of modifying anything within the vendor folder I hacked a "proof" of concept" (outside of spout) over the past couple of weekends. I haven't decided on how to integrate it into spout yet. I think I'd like to introduce 2, maybe 3 new classes ... Font or Fontname with just the supported fonts as constants like so Font + at least a class SizeCalculator as property of the worksheet and maybe a SizeCollection to hold character widths ... allthough this one could probably be kept inside of the SizeCalculator class =)

Since the <cols> tag must be placed before <sheetdata> the only viable way to do this is IMHO to add whitespaces at the beginning of the sheet.xml. While not a perfect solution its probably better than writing all sheetdata into a temp file just to read and write it again into a final sheet.xml. Also 1mb of spaces (to allow up to ~16k columns) effectivly account for just a couple of bytes in the final xlsx.

Would the above be cool with you? =)

edit: ... just noticed github doesnt escape < > :D

ljudina commented 8 years ago

+1

wnasich commented 8 years ago

+1

adrilo commented 8 years ago

@nimmneun I'm cool with this approach :) My only concern would be with the SizeCollection data. I saw that you get the different character sizes from a CSV file. Do you know where this data is coming from? Also, it does not cover much of the non-english characters (which is an ok tradeoff before we can build a more robust system for SizeCalculator).

This is also an interesting choice:

Define minimum and maximum column widths to keep exceptionally large or small cell contents in check.

nimmneun commented 8 years ago

@adrilo ok cool =) Yeah - I'm not perfectly happy with the precalculated solution as well :-/ I calculated the sizes using imagettfbbox() combined with several switch cases *lol Unfortunately imagettfbbox is not as accurate as I was hoping and works only for a limited number of fonts. For others it was way off, especially with fonts that had character spacing.

I implemented the width calculation like this at first

    /**
     * @param mixed $value
     * @param Font  $font
     * @return float
     */
    public function getRenderedCellWidth($value, Font $font)
    {
        $box = imageftbbox($font->getSize(), 0, $this->getFontPath($font), $value);
        $pixels = abs($box[4] - $box[0]);
        $width = round(($pixels + 5) / 7, 3);

        return $width + 0.7;
    }

    /**
     * @param Font $font
     * @return string
     */
    public function getFontPath(Font $font)
    {
        return __DIR__ . '/fonts/' . $this->fontMap[$font->getName()] . '.ttf';
    }

But it was just painfully slow :sob: and runtimes for larger datasets would end up running 20-30 minutes instead of 5 minutes. The FontMap was necessary because the font file name often differs from the actual font name. Later on I wrote this FontMeta "extractor" from code snippets I found @stackoverflow: https://gist.github.com/nimmneun/2ba1a6e64cc2c075c155f65fee9422c0, which at least made the FontMap superfluous. But that didn't solve the performance issue.

I think I'll have some spare time this coming weekend and look forward to getting a few LOC written =)

nimmneun commented 7 years ago

pushed an intermediary commit to https://github.com/nimmneun/spout/tree/cell-autosizing

darrinsworlds commented 7 years ago

Are you still working on this?

germain-italic commented 7 years ago

Up. Any news on implementing this feature?

harrygulliford commented 7 years ago

Would love to see this implemented soon!

pedrofsantoscom commented 7 years ago

+1

gmmarc commented 7 years ago

+1

dmitryuk commented 7 years ago

+1

bugalot commented 7 years ago

+1

savdeep commented 7 years ago

+1

gabbanaesteban commented 7 years ago

I need this! :(

podorozhny commented 7 years ago

+1

NicolasN commented 7 years ago

+1

AzzaAzza69 commented 7 years ago

Can I suggest a fit-all: $writer->setColumnWidths(aFixedWidths|'autosize-char'|'autosize-font');

so the 'autosize-font' could be used for XLSX/ODS (but you could use the 'autosize-char' if you wanted quicker width calculating or the aFixedWidths for no speed reduction)

for the 'autosize-xxx' methods, you could:

As I needed column-widths and alignment, I had to go with https://github.com/mk-j/PHP_XLSXWriter and code my own writer for outputting delimited text ...

aocneanu commented 7 years ago

This would be a really nice feature.

BTW Great work with this lib!

Talgat777 commented 6 years ago

+1 Please integrate this feature

doonot commented 6 years ago

+1

ghost commented 6 years ago

+100500

megaxorg commented 6 years ago

+1

meow257 commented 6 years ago

+1

smehtaCAS commented 6 years ago

An option to allow to set the column width without auto-fit should be very useful until something for the auto-fit is released. From what it seems, this might a good value for the time.

mnico commented 6 years ago

+1

andrewtweber commented 6 years ago

PHPSpreadsheet has the automatic width capability.

I might take a stab at this someday, but if not then whoever does might find their code useful:

https://github.com/PHPOffice/PhpSpreadsheet/blob/bebfb1e41a770de8033dc37affac72c3d7ceda6d/src/PhpSpreadsheet/Worksheet/Worksheet.php#L712

https://github.com/PHPOffice/PhpSpreadsheet/blob/bebfb1e41a770de8033dc37affac72c3d7ceda6d/src/PhpSpreadsheet/Shared/Font.php#L226

Also worth noting that this is very fast, even with millions of rows

simon25608 commented 6 years ago

+1

Micsou commented 6 years ago

+1

feng99 commented 6 years ago

Is there a solution to this problem now?

mlshvdv commented 5 years ago

Any news?

sinri commented 5 years ago

Emmm, seems no more news on this topic, maybe some hacking work is needed

JavierSolis commented 5 years ago

Oh, figured it out--just added this line to Worksheet.php, startSheet(), between the header and the data: `fwrite($this->sheetFilePointer, '

For more reference on the format you can search this link. https://docs.microsoft.com/en-us/previous-versions/office/developer/office-2010/cc802410(v=office.14)?redirectedfrom=MSDN

The ones used are:

customWidth (Custom Width) Flag indicating that the column width for the affected column (s) is different from the default or has been manually set.

max (Maximum Column) Last column affected by this 'column info' record.

min (Minimum Column) First column affected by this 'column info' record.

width (Column Width) Column width, more details about its calculation in the link

Example:

Box\Spout\Writer\XLSX\Manager\WorksheetManager.php

$widths=[10,33,35,15,26,19,20,16,16,17,32];

$widthCols='<cols>';

foreach ($widths as $index=>$width){
            $colAffect = $index+1;
            $widthCols.='<col min="'.$colAffect.'" max="'.$colAffect.'" width="'.$width.'" customWidth="1"/>';
}

$widthCols.='</cols>';
fwrite($sheetFilePointer, $widthCols);
alexanderop commented 4 years ago

+1

jeanvcastro commented 4 years ago

+1

smartniazi commented 4 years ago

+1

gleydsonruan commented 4 years ago

+1

jadamec commented 3 years ago

+1

chuoke commented 3 years ago

sounds so sad