dexplo / dataframe_image

A python package for embedding pandas DataFrames as images into pdf and markdown documents
https://dexplo.org/dataframe_image
MIT License
281 stars 41 forks source link

Styled dataframe limited to 25 rows of output? #115

Closed justinclarkhome closed 3 weeks ago

justinclarkhome commented 2 months ago

Hello! This library is very useful for me, so I thank you for that!

One thing I've noticed recently - which I suspect is cropping up through an updated dependency as it didn't happen prior to the last few days - is that dfi.export() seems to limit output to about 25 rows of data.

I created a fresh pip venv with dataframe_image 0.2.4, pandas 2.2.2, numpy 2.1., jupyterlab 4.2.4, matplotlib 3.9.2.

Here's a simple example, which creates a styled dataframe containing 50 random floats in a single column:

import pandas as pd
import numpy as np
import io
import dataframe_image as dfi
import matplotlib.pyplot as plt

fig = plt.Figure()
random_data = pd.Series(data=np.random.normal(size=50)) \
    .to_frame('Hi').style.format('{:.2f}')
with open('test.png', 'wb') as write_obj:
    dfi.export(
        random_data,
        write_obj,
        max_rows=-1, # I tried None, -1 and 100
        dpi=None,
    )
    fig.savefig(write_obj)

For me, this gives the following output (only outputs 25 of the 50 rows):

test

Any ideas what may be the cause here?

Thank you!

justinclarkhome commented 2 months ago

Did some more digging, and I think this may be related to an update of Chrome/Chromium. I have a machine at my office with an older Chrome install that ran the above snippet cleanly (incidentally, shortly after that it had a browser update pushed, after which the output became truncated).

If anyone else is having similar issues, I was able to get similar output (ish) after installing Playwright and using that for table conversion.

justinclarkhome commented 2 months ago

Here is an interesting test where I pass a dataframe of 50x10 random floats into dfi.export() using chrome and playwright as the table_conversion args.

import pandas as pd
import numpy as np
import io
import dataframe_image as dfi
from PIL import Image

random_data = pd.DataFrame(data=np.random.normal(size=(50, 10))).style.format('{:.2f}')

for table_conversion in ['chrome', 'playwright']:
    with io.BytesIO() as write_obj:
        dfi.export(
            random_data,
            write_obj,
            table_conversion=table_conversion,
        )
        image_w, image_h = Image.open(write_obj).size
        print(f'{table_conversion} w x h: {image_w} x {image_h}')

Notice that the resulting image height from the chrome export is about half the size of the playwright export:

Screenshot 2024-08-29 at 3 00 31 PM
PaleNeutron commented 2 months ago

It seems chrome made some changes. I suggest moving to playwright for more stable support.

Maybe this bug will gone in next Chrome release.

justinclarkhome commented 1 month ago

A colleague of mine found this:

https://github.com/rstudio/chromote/issues/171

I was able to revert to the prior behavior of chrome by editing line 118 of chrome_converter.py - the args list - to contain “—headless=old” for the time being.

Sorry for no screenshots but am on mobile!

PaleNeutron commented 1 month ago

@justinclarkhome there are two problems:

  1. At some point in the future, --headless=old will be removed and users will be expected to use a separate chrome-headless-shell binary, which is already available.

    That means --headless=old is deprecated and may cause panic in the feature.

  2. See https://github.com/dexplo/dataframe_image/issues/78 , we enforce --headless=new to make --window-size parameter works

It seems we need another solution. Maybe new chrome shell binary is a good choice https://developer.chrome.com/blog/chrome-headless-shell

justinclarkhome commented 1 month ago

Thank you @PaleNeutron. I’m just hacking that change in locally, hoping whatever happened with Chrome is fixed in a future update (allowing “new” headless to output the same as “old”). Unfortunately, I can’t download that binary in my work environment.

In the meantime I am converting my own tool to optionally use playwright, and that works, but leads to another (unrelated) issue for me that I’m trying to work out.

PaleNeutron commented 3 weeks ago

Fixed in https://github.com/dexplo/dataframe_image/releases/tag/v0.2.5