dexplo / dataframe_image

A python package for embedding pandas DataFrames as images into pdf and markdown documents
https://dexplo.org/dataframe_image
MIT License
282 stars 41 forks source link

Problem with DataFrame_Image No such file or directory #79

Closed AleelA190 closed 1 year ago

AleelA190 commented 1 year ago

When I try to save a dataframe with styler in a png, I have this error: "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmph0pszm_j/temp.png'".

What is very strange is that yesterday it worked correctly for me and today it gives me that error for all the scripts which create a png from a dataframe. The specific problem is when I want to create the png image of a dataframe I get that error that is generated because the library tries to create a random temporary folder with an image called temp.png inside and when the library tries to open this temporary file, the library didn't find it.

How can I solve it?

s95050937 commented 1 year ago

same issue

NokibMonsur commented 1 year ago

same Issue

raykevin2222 commented 1 year ago

Same issue. Maybe, the size of the image will make the error. I treid to reduce the rows of dataframe and succeed in generating the image, but it's not a real solution.

ATcn commented 1 year ago

same Issue, dataframe-image 0.1.5

==========found some problem============ dataframe_image._screenshot.py.Screenshot():

take_screenshot():
    ...
    return self.possibly_enlarge(img)    ------------------- A

possibly_enlarge():
    ...
    # must be all white for 30 pixels in a row to trigger stop
    if enlarge:
        return self.take_screenshot()    --------------- B

problem: A<->B loop call

========local fix============== dataframe_image._screenshot.py:146 [possibly_enlarge()]

    # must be all white for 30 pixels in a row to trigger stop
    if all_white_vert[-30:].sum() != 30:    --------------- line 146 old
        self.ss_width = int(self.ss_width * 1.5)
        enlarge = True

fix: edit 30 to 14(the fit num that my code debug get)

    # must be all white for 30 pixels in a row to trigger stop
    if all_white_vert[-14:].sum() != 14:    --------------- line 146 new
        self.ss_width = int(self.ss_width * 1.5)
        enlarge = True

then run success!

Gelato-B2B commented 1 year ago

Same issue

bernardt2018 commented 1 year ago

Same issue

calvin5walters commented 1 year ago

Same issue

lyp2008001 commented 1 year ago

Same issue

kaunasDTM commented 1 year ago

I have the same issue, it looks like it was related to google chrome update. Right now a workaround is to add table_conversion='matplotlib' as it normally uses chrome as table conversion. But for me all the table formats were lost with this workaround.

erickalfaro commented 1 year ago

Same issue.

dclarksixteen commented 1 year ago

Same issue

ZiyadMoraished commented 1 year ago

same issue

wlsdml1114 commented 1 year ago

same issue

gtianyi commented 1 year ago

same issue dfi.export(df, "table.png", table_conversion='matplotlib') works for the moment

brunomorikuni commented 1 year ago

same issue

fredericomattos commented 1 year ago

same issue

benjaminmoritz commented 1 year ago

Hi. Here is the fix. I followed the idea from @Neo-Luo which is documented here: https://developer.chrome.com/articles/new-headless/ Go to the file _screenshot.py and change "--headless" to "--headless=new" The documenation from google says, that if you do not provide the "=new" then it will automatically use "=old" image

erickalfaro commented 1 year ago

I tried both suggestions by @benjaminmoritz, @ATcn and neither worked for me.

@gtianyi - in my case matplotlib is not as I need to keep the formatting intact.

kmsenator commented 1 year ago

local fix in _screenshot.py, line 157-158:

#if enlarge:
        #    return self.take_screenshot()

and use fontsize and dpi in export,

example: dfi.export(df, "table.png", fontsize=8, dpi=150)

910466892 commented 1 year ago

I tried both suggestions by @benjaminmoritz, @ATcn and neither worked for me.

@gtianyi - in my case matplotlib is not as I need to keep the formatting intact.

Same

kmsenator commented 1 year ago

Hi. Here is the fix. I followed the idea from @Neo-Luo which is documented here: https://developer.chrome.com/articles/new-headless/ Go to the file _screenshot.py and change "--headless" to "--headless=new" The documenation from google says, that if you do not provide the "=new" then it will automatically use "=old" image

I tried, but unfortunately it doesn't work for some reason, although it understands argument "headless=old", if "headless=new" then doesn't take a screenshot (there is no file).

waterbear1996 commented 1 year ago

I modify _screenshot.py with selenium.webdriver.firefox and then convert temp.html to temp.png. I'm happy with it...

kmsenator commented 1 year ago

I noticed that on all devices, the screenshot began to be taken with a resolution of 800x600, and on all types of browsers with the chromium engine

benjaminmoritz commented 1 year ago

That's strange. I use the newest version of chrome (111.0.5563.65) and don't forget to "restart" VSCode / python after editing the package. I have the packages in venv. I just call dfi.export(df, filename, dpi=300)

waterbear1996 commented 1 year ago

I noticed that on all devices, the screenshot began to be taken with a resolution of 800x600, and on all types of browsers with the chromium engine

The same thing happened to me, so I decided to switch to Selenium instead. I also noticed a commend for the Chrome driver to take full-size screenshots. But doesn't know how to interact with it under headless mode. image

Lehao25 commented 1 year ago

I modify _screenshot.py with selenium.webdriver.firefox and then convert temp.html to temp.png. I'm happy with it...

Would you mind share your method pls?

waterbear1996 commented 1 year ago

I modify _screenshot.py with selenium.webdriver.firefox and then convert temp.html to temp.png. I'm happy with it...

Would you mind share your method pls?

image image Don't know this work for you or not?

Neo-Luo commented 1 year ago

I think the easiest way to fix this problem is like this:

As this problem is caused by the latest version of Chrome. New Headless Chrome was using the platform window size disregarding --window-size setting if it was larger than the active display work area. So when your dataframe shape is larger than 800*600, the code will run into Endless loop in possibly_enlarge and throw 'No such file or directory" Error.

you can use smaller 'fontsize' and greater 'dpi' to reshape your df display area, and then skip this problem. for example:" fontsize=10 and dpi=100" caused this error but the following setting worked for my ‘df' shape.

dfi.export(df, file_name, fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None) if your dataframe is bigger ,you can set fontsize smaller and set dpi greater untill it is successful.

benjaminmoritz commented 1 year ago

@Neo-Luo I even tried fontsize=1, dpi=2000 and it did not work for me on a 4K Monitor.

My solution is working fine for me. Maybe others need to set ss._width = 1400 and ss_height = 900 to other values?

I work with the 0.1.5 Release. There has been a change in _screenshot.py which is not in the release see here: https://github.com/dexplo/dataframe_image/commit/55ada23f4a51e52faa462943b45fce45c1c2df77 This could also make a problem

image

jorgapa commented 1 year ago

I modify _screenshot.py with selenium.webdriver.firefox and then convert temp.html to temp.png. I'm happy with it...

Would you mind share your method pls?

image image Don't know this work for you or not?

yes man! that solved the issue for me

TatianePretto commented 1 year ago

Neo-Luo solution " fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None" worked for me.

paddypunch commented 1 year ago

fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None

Thanks @Neo-Luo . This helped overcome the issue.

tomnewg commented 1 year ago

I think the easiest way to fix this problem is like this:

As this problem is caused by the latest version of Chrome. New Headless Chrome was using the platform window size disregarding --window-size setting if it was larger than the active display work area. So when your dataframe shape is larger than 800*600, the code will run into Endless loop in possibly_enlarge and throw 'No such file or directory" Error.

you can use smaller 'fontsize' and greater 'dpi' to reshape your df display area, and then skip this problem. for example:" fontsize=10 and dpi=100" caused this error but the following setting worked for my ‘df' shape.

dfi.export(df, file_name, fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None) if your dataframe is bigger ,you can set fontsize smaller and set dpi greater untill it is successful.

this works for me. Thank you very much!

Zoe54445 commented 1 year ago

https://github.com/dexplo/dataframe_image/issues/79#issuecomment-1469844267

Thanks @Neo-Luo !!! This way works for me, too.

buckyster commented 1 year ago

Neo-Luo's workaround worked for me. I did simplify it a little by removing table_conversion='chrome', chrome_path=None since those were already the default values. And increase fontsize from 3.8 to 4 to be a round number.

dfi.export(df, file_name, fontsize=4, dpi=800)


Would it be helpful to modify the dataframe_image code to try fontsize=4, dpi=800 if the default fails? This might make dfi.export() work successfully for most cases without having all users change their code everywhere they call dfi.export()

Example (untested) pseudocode for https://github.com/dexplo/dataframe_image/blob/master/dataframe_image/_pandas_accessor.py

def export(...):
    try:
        _export(...)
    except:
        _export(..., fontsize=4, dpi=800)
ATcn commented 1 year ago

I tried both suggestions by @benjaminmoritz, @ATcn and neither worked for me.

@gtianyi - in my case matplotlib is not as I need to keep the formatting intact.

the deep reason caused is a loop call between two function of the package, I didnt pay more time to dig it deeply, just debug the key code, then edit condition value and pass the it.

mine is just one inst, not a common solve

wosantos95 commented 1 year ago

Google collab solution

Hey, I solved my problem by making small tweaks to the script shared by @waterbear1996

If your case is similar below, I share the solution.

1º Select the code below and completely replace the _screenshot.py

Open the _screenshot.py:

/usr/local/lib/python3.9/dist-packages/dataframe_image/_screenshot.py

Copy the code below and replace:



import base64
import io
import platform
import shutil
import subprocess
from pathlib import Path
from tempfile import TemporaryDirectory

import numpy as np
from matplotlib import image as mimage

from .pd_html import styler2html

#Add the following
import selenium.webdriver
import selenium.common
import os
options = selenium.webdriver.firefox.options.Options()
options.add_argument("--headless")
###

def get_system():
    system = platform.system().lower()
    if system in ["darwin", "linux", "windows"]:
        return system
    else:
        raise OSError(f"Unsupported OS - {system}")

def get_chrome_path(chrome_path=None):
    system = get_system()
    if chrome_path:
        return chrome_path

    if system == "darwin":
        paths = [
            "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
            "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser",
        ]
        for path in paths:
            if Path(path).exists():
                return path
        raise OSError("Chrome executable not able to be found on your machine")
    elif system == "linux":
        paths = [
            None,
            "/usr/local/sbin",
            "/usr/local/bin",
            "/usr/sbin",
            "/usr/bin",
            "/sbin",
            "/bin",
            "/opt/google/chrome",
        ]
        commands = [
            "google-chrome",
            "chrome",
            "chromium",
            "chromium-browser",
            "brave-browser",
        ]
        for path in paths:
            for cmd in commands:
                chrome_path = shutil.which(cmd, path=path)
                if chrome_path:
                    return chrome_path
        raise OSError("Chrome executable not able to be found on your machine")
    elif system == "windows":
        import winreg

        locs = [
            r"SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\chrome.exe",
            r"SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\brave.exe",
        ]
        for loc in locs:
            handle = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, loc)
            num_values = winreg.QueryInfoKey(handle)[1]
            if num_values > 0:
                return winreg.EnumValue(handle, 0)[1]
        raise OSError("Cannot find chrome.exe on your windows machine")

class Screenshot:
    def __init__(
        self,
        center_df=True,
        max_rows=None,
        max_cols=None,
        chrome_path=None,
        fontsize=18,
        encode_base64=True,
        limit_crop=True,
        device_scale_factor=1
    ):
        self.center_df = center_df
        self.max_rows = max_rows
        self.max_cols = max_cols
        self.ss_width = 1400
        self.ss_height = 900
        self.chrome_path = get_chrome_path(chrome_path)
        self.css = self.get_css(fontsize)
        self.encode_base64 = encode_base64
        self.limit_crop = limit_crop
        self.device_scale_factor = device_scale_factor

    def get_css(self, fontsize):
        mod_dir = Path(__file__).resolve().parent
        css_file = mod_dir / "static" / "style.css"
        with open(css_file) as f:
            css = "<style>" + f.read() + "</style>"
        justify = "center" if self.center_df else "left"
        css = css.format(fontsize=fontsize, justify=justify)
        return css

    def take_screenshot(self):
        temp_dir = TemporaryDirectory()
        temp_html = Path(temp_dir.name) / "temp.html"
        temp_img = Path(temp_dir.name) / "temp.png"
        with open(temp_html, "w", encoding="utf-8") as f:
            f.write(self.html)

        args = [
            "--enable-logging",
            "--disable-gpu",
            "--headless",
            "--no-sandbox",
            "--crash-dumps-dir=/tmp",
            f"--force-device-scale-factor={self.device_scale_factor}",
        ]

        # if self.ss_width and self.ss_height:
        #     args.append(f"--window-size={self.ss_width},{self.ss_height}")

        args += [
            "--hide-scrollbars",
            f"--screenshot={str(temp_img)}",
            str(temp_html),
        ]

        with selenium.webdriver.Firefox(options=options) as driver:
          shutil.copy(temp_html,"/tmp") # move file from tmp to your desired working folder
          driver.get('file:///tmp/temp.html') # selenium will do the rest

          required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
          required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
          driver.set_window_size(required_width+150,required_height+90)
          driver.save_screenshot(temp_img)

        # subprocess.run(executable=self.chrome_path, args=args)

        with open(temp_img, "rb") as f:
            img_bytes = f.read()

        buffer = io.BytesIO(img_bytes)
        img = mimage.imread(buffer)
        return self.possibly_enlarge(img)

    def possibly_enlarge(self, img):
        enlarge = False
        img2d = img.mean(axis=2) == 1

        all_white_vert = img2d.all(axis=0)
        # must be all white for 30 pixels in a row to trigger stop
        if all_white_vert[-5:].sum() != 5:
            self.ss_width = int(self.ss_width * 1.5)
            enlarge = True

        all_white_horiz = img2d.all(axis=1)
        if all_white_horiz[-5:].sum() != 5:
            self.ss_height = int(self.ss_height * 1.5)
            enlarge = True

        if enlarge:
            return self.take_screenshot()

        return self.crop(img, all_white_vert, all_white_horiz)

    def crop(self, img, all_white_vert, all_white_horiz):
        diff_vert = np.diff(all_white_vert)
        left = diff_vert.argmax()
        right = -diff_vert[::-1].argmax()
        if self.limit_crop:
            max_crop = int(img.shape[1] * 0.15)
            left = min(left, max_crop)
            right = max(right, -max_crop)

        diff_horiz = np.diff(all_white_horiz)
        top = diff_horiz.argmax()
        bottom = -diff_horiz[::-1].argmax()
        new_img = img[top:bottom, left:right]
        return new_img

    def finalize_image(self, img):
        buffer = io.BytesIO()
        mimage.imsave(buffer, img)
        img_str = buffer.getvalue()
        if self.encode_base64:
            img_str = base64.b64encode(img_str).decode()
        return img_str

    def run(self, html):
        self.html = self.css + html
        img = self.take_screenshot()
        img_str = self.finalize_image(img)
        return img_str

    def repr_png_wrapper(self):
        from pandas.io.formats.style import Styler

        ss = self

        def _repr_png_(self):
            if isinstance(self, Styler):
                html = styler2html(self)
            else:
                html = self.to_html(
                    max_rows=ss.max_rows, max_cols=ss.max_cols, notebook=True
                )
            return ss.run(html)

        return _repr_png_

def make_repr_png(center_df=True, max_rows=30, max_cols=10, chrome_path=None):
    """
    Used to create a _repr_png_ method for DataFrames and Styler objects
    so that nbconvert can use it to create images directly when
    executing the notebook before conversion to pdf/markdown.

    Parameters
    ----------
    center_df : bool, default True
        Choose whether to center the DataFrames or not in the image. By
        default, this is True, though in Jupyter Notebooks, they are
        left-aligned. Use False to make left-aligned.

    max_rows : int, default 30
        Maximum number of rows to output from DataFrame. This is forwarded to
        the `to_html` DataFrame method.

    max_cols : int, default 10
        Maximum number of columns to output from DataFrame. This is forwarded
        to the `to_html` DataFrame method.

    chrome_path : str, default None
        Path to your machine's chrome executable. When `None`, it is
        automatically found. Use this when chrome is not automatically found.
    """
    ss = Screenshot(center_df, max_rows, max_cols, chrome_path)
    return ss.repr_png_wrapper()

Remembering that to use this code it is necessary to install selenium.webdriver

after performing the step by step, restart the environment without installing the libraries again.

PaleNeutron commented 1 year ago

Hello, everyone. I have published a new release which should resolve current problem. Can you update to latest version and check if it works?

BTW, it supports Google colab now.

rcym1505 commented 1 year ago

Hello, everyone. I have published a new release which should resolve current problem. Can you update to latest version and check if it works?

BTW, it supports Google colab now.

Hi @PaleNeutron , confirm that it works on my end. Thanks for the quick fix!!

hdliu1997 commented 1 year ago

@PaleNeutron Thx for ur contribution. I try the latest version, but it doesnt work for me. It just converts several lines in the top of the dataframe. The rest of the dataframe cannot be exported correctly.

PaleNeutron commented 1 year ago

@hdliu1997 I think your problem is not related to this issue. Please open a new issue and provide full example code to reproduce the problem.

KevinMyDing commented 1 year ago

@PaleNeutron I try the latest version, it has a new error:

{CalledProcessError}Command '['--enable-logging', '--disable-gpu', '--headless=new', '--crash-dumps-dir=/tmp', '--force-device-scale-factor=1', '--window-size=1400,900', '--hide-scrollbars', '--screenshot=C:\\Users\\dingm\\AppData\\Local\\Temp\\tmp_2r5igrl\\temp.png', 'C:\\Users\\dingm\\AppData\\Local\\Temp\\tmp_2r5igrl\\temp.html']' returned non-zero exit status 21.

I run it on Windows 11.

PaleNeutron commented 1 year ago

@KevinMyDing, the latest version from pypi not github. Github version is not stable.

pip install dataframe_image==0.1.7
yun881201 commented 1 year ago

@hdliu1997 same issue. I tried the latest version 0.1.7, but it did not work for me too. Actually, the warning disappeared, but I obtained a wrong dataframe image. The image is just a white line. The code dfi.export(df, name, fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None) works well. But this gives me an ugly dataframe with fat table line, which is not the same with the dataframe in the cell. @PaleNeutron .

hdliu1997 commented 1 year ago

@hdliu1997 same issue. I tried the latest version 0.1.7, but it did not work for me too. Actually, the warning disappeared, but I obtained a wrong dataframe image. The image is just a white line. The code dfi.export(df, name, fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None) works well. But this gives me an ugly dataframe with fat table line, which is not the same with the dataframe in the cell. @PaleNeutron .

Ur advice works well for me too. It's a great alternative before this issue is fixed lol

waterbear1996 commented 1 year ago

@hdliu1997 Was your HTML file generated correctly? There may be an issue with geckodriver...

KevinMyDing commented 1 year ago

@hdliu1997 same issue. I tried the latest version 0.1.7, but it did not work for me too. Actually, the warning disappeared, but I obtained a wrong dataframe image. The image is just a white line. The code dfi.export(df, name, fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None) works well. But this gives me an ugly dataframe with fat table line, which is not the same with the dataframe in the cell. @PaleNeutron .

same issue. the tmp.html and tmp.png is working.

fredericomattos commented 1 year ago

Hi @PaleNeutron for me the error no longer appears, but now the problem has changed and reaches another way, see:

Before the problem, when I used:

dfi.export(df, name_file + ".png")

I was able to print 70 lines without any problem. After the problem, I had to adjust to keep the result at least similar:

dfi.export(df, name_file + ".png", fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None)

Result: image

Now (0.1.7 version) going back to the original, I tried to run it and got this result (notice it's cut off with missing lines):

dfi.export(df, name_file + ".png")

image

calvin5walters commented 1 year ago

For me the error no longer appears, but now the problem has changed and reaches another way, see:

Before the problem, when I used:

dfi.export(df, name_file + ".png")

I was able to print 70 lines without any problem. After the problem, I had to adjust to keep the result at least similar:

dfi.export(df, name_file + ".png", fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None)

Result: image

Now (0.1.7 version) going back to the original, I tried to run it and got this result (notice it's cut off with missing lines):

dfi.export(df, name_file + ".png")

image

I am having the same issue. I no longer get the error, but my tables are getting cut off early.

PaleNeutron commented 1 year ago

@fredericomattos @calvin5walters , this is a known issue. Latest version chrome will not accept cli argument window-size so you can not get full image with chrome.

Two solutions:

  1. dfi.export(df, name_file + ".png", fontsize=3.8, dpi=800, table_conversion='chrome', chrome_path=None) will still work
  2. use selenium with Firefox :
    
    !apt install firefox firefox-geckodriver
    !pip install dataframe_image selenium

df.dfi.export('df.png', table_conversion="selenium", max_rows=-1)