dexplo / dataframe_image

A python package for embedding pandas DataFrames as images into pdf and markdown documents
https://dexplo.org/dataframe_image
MIT License
282 stars 41 forks source link

SyntaxError: not a PNG file #68

Closed tylercap17 closed 1 year ago

tylercap17 commented 1 year ago

Hi, I've been using the dataframe-image package for some time and as of this week it stopped working for me. The relevant code is simply:

dfi.export(cal_sorted,"Earnings Calendar "+str(today.date())+" .png")

which gives me the error:

SyntaxError: not a PNG file

I am not sure why this issue suddenly arose. I am using the latest version. The export with table_conversion = "matplotlib" works, but the formatting is not good.

Thanks!

ogsamoda commented 1 year ago

I am getting the same error. Any help is appreciated.

joshuabrownenz commented 1 year ago

I have the same issue

joshuabrownenz commented 1 year ago

I haven't been able to find a solution yet, but it does seem to be an issue with chrome. Similar to this issue https://github.com/dexplo/dataframe_image/issues/14. I've tried reinstalling chrome and changing my chrome path to no avail. Maybe it's an issue after an Chrome update?

Mumbo-Jumbo-3 commented 1 year ago

Same issue.

iraaz4321 commented 1 year ago

I got it working by not having the temporary image file open while the chrome subprocess is ran. Below is monkey patch which works for me but I don't know why the file was opened. If Chrome update was the reason for it breaking maybe older version didn't work without it.

import dataframe_image as dfi
from dataframe_image._screenshot import Screenshot

import subprocess
import io
from pathlib import Path
from tempfile import TemporaryDirectory
from matplotlib import image as mimage

def take_screenshot_override(self):
    temp_dir = TemporaryDirectory()
    temp_html = Path(temp_dir.name) / "temp.html"
    temp_img = Path(temp_dir.name) / "temp.png"
    with open(temp_html, "w", encoding="utf-8") as f:
        f.write(self.html)

    args = [
        "--enable-logging",
        "--disable-gpu",
        "--headless",
        "--no-sandbox",
        "--crash-dumps-dir=/tmp",
        f"--force-device-scale-factor={self.device_scale_factor}",
    ]

    if self.ss_width and self.ss_height:
        args.append(f"--window-size={self.ss_width},{self.ss_height}")

    args += [
        "--hide-scrollbars",
        f"--screenshot={str(temp_img)}",
        str(temp_html),
    ]

    subprocess.run(executable=self.chrome_path, args=args)
    with open(temp_img, "rb") as f:
        img_bytes = f.read()

    buffer = io.BytesIO(img_bytes)
    img = mimage.imread(buffer)
    return self.possibly_enlarge(img)

Screenshot.take_screenshot = take_screenshot_override
buckyster commented 1 year ago

Same issue for me. I thought it was because I was using a Virtual Machine, but maybe it's because the Chrome version on my VM is newer.

I was able to workaround by replacing _screenshot.py Line 115 : with open(temp_img, "wb") as f:

with this replacement line: with open(temp_html, "rb") as f:

Or delete Line 115 altogether, and de-indent the block below it. It doesn't makes sense to me to open temp_img for writing, when it's being written by a subprocess (Chrome). It seems like doing that blocked Chrome from writing to temp_img now.


Details of bug:

Windows 10 VM on VirtualBox, running on Windows 10 bare metal OS. VirtualBox 7.0.4 Chrome 109.0.5414.75 python 3.9.6 dataframe-image 0.1.3

>>> import pandas as pd
>>> import dataframe_image as dfi
>>> df = pd.DataFrame({'x':[1,2,3]})
>>> dfi.export(df, 'out.png')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 48, in export
    return _export(
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 117, in _export
    img_str = converter(html)
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 194, in run
    img = self.take_screenshot()
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 146, in take_screenshot
    img = mimage.imread(buffer)
  File "C:\Python39\lib\site-packages\matplotlib\image.py", line 1541, in imread
    with img_open(fname) as image:
  File "C:\Python39\lib\site-packages\PIL\ImageFile.py", line 117, in __init__
    self._open()
  File "C:\Python39\lib\site-packages\PIL\PngImagePlugin.py", line 732, in _open
    raise SyntaxError(msg)
SyntaxError: not a PNG file
tylercap17 commented 1 year ago

I got it working by not having the temporary image file open while the chrome subprocess is ran. Below is monkey patch which works for me but I don't know why the file was opened. If Chrome update was the reason for it breaking maybe older version didn't work without it.

import dataframe_image as dfi
from dataframe_image._screenshot import Screenshot

import subprocess
import io
from pathlib import Path
from tempfile import TemporaryDirectory
from matplotlib import image as mimage

def take_screenshot_override(self):
    temp_dir = TemporaryDirectory()
    temp_html = Path(temp_dir.name) / "temp.html"
    temp_img = Path(temp_dir.name) / "temp.png"
    with open(temp_html, "w", encoding="utf-8") as f:
        f.write(self.html)

    args = [
        "--enable-logging",
        "--disable-gpu",
        "--headless",
        "--no-sandbox",
        "--crash-dumps-dir=/tmp",
        f"--force-device-scale-factor={self.device_scale_factor}",
    ]

    if self.ss_width and self.ss_height:
        args.append(f"--window-size={self.ss_width},{self.ss_height}")

    args += [
        "--hide-scrollbars",
        f"--screenshot={str(temp_img)}",
        str(temp_html),
    ]

    subprocess.run(executable=self.chrome_path, args=args)
    with open(temp_img, "rb") as f:
        img_bytes = f.read()

    buffer = io.BytesIO(img_bytes)
    img = mimage.imread(buffer)
    return self.possibly_enlarge(img)

Screenshot.take_screenshot = take_screenshot_override

This worked for me too.

dorellanaff commented 1 year ago

Same issue for me. I thought it was because I was using a Virtual Machine, but maybe it's because the Chrome version on my VM is newer.

I was able to workaround by replacing _screenshot.py Line 115 : with open(temp_img, "wb") as f:

with this replacement line: with open(temp_html, "rb") as f:

Or delete Line 115 altogether, and de-indent the block below it. It doesn't makes sense to me to open temp_img for writing, when it's being written by a subprocess (Chrome). It seems like doing that blocked Chrome from writing to temp_img now.

Details of bug:

Windows 10 VM on VirtualBox, running on Windows 10 bare metal OS. VirtualBox 7.0.4 Chrome 109.0.5414.75 python 3.9.6 dataframe-image 0.1.3

>>> import pandas as pd
>>> import dataframe_image as dfi
>>> df = pd.DataFrame({'x':[1,2,3]})
>>> dfi.export(df, 'out.png')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 48, in export
    return _export(
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 117, in _export
    img_str = converter(html)
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 194, in run
    img = self.take_screenshot()
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 146, in take_screenshot
    img = mimage.imread(buffer)
  File "C:\Python39\lib\site-packages\matplotlib\image.py", line 1541, in imread
    with img_open(fname) as image:
  File "C:\Python39\lib\site-packages\PIL\ImageFile.py", line 117, in __init__
    self._open()
  File "C:\Python39\lib\site-packages\PIL\PngImagePlugin.py", line 732, in _open
    raise SyntaxError(msg)
SyntaxError: not a PNG file

Great solution, I finally got it works.

Mumbo-Jumbo-3 commented 1 year ago

So it seems i have to edit the dataframe-image package itself with these solutions? Please excuse my ignorance.

Same issue for me. I thought it was because I was using a Virtual Machine, but maybe it's because the Chrome version on my VM is newer.

I was able to workaround by replacing _screenshot.py Line 115 : with open(temp_img, "wb") as f:

with this replacement line: with open(temp_html, "rb") as f:

Or delete Line 115 altogether, and de-indent the block below it. It doesn't makes sense to me to open temp_img for writing, when it's being written by a subprocess (Chrome). It seems like doing that blocked Chrome from writing to temp_img now.

Details of bug:

Windows 10 VM on VirtualBox, running on Windows 10 bare metal OS. VirtualBox 7.0.4 Chrome 109.0.5414.75 python 3.9.6 dataframe-image 0.1.3

>>> import pandas as pd
>>> import dataframe_image as dfi
>>> df = pd.DataFrame({'x':[1,2,3]})
>>> dfi.export(df, 'out.png')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 48, in export
    return _export(
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 117, in _export
    img_str = converter(html)
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 194, in run
    img = self.take_screenshot()
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 146, in take_screenshot
    img = mimage.imread(buffer)
  File "C:\Python39\lib\site-packages\matplotlib\image.py", line 1541, in imread
    with img_open(fname) as image:
  File "C:\Python39\lib\site-packages\PIL\ImageFile.py", line 117, in __init__
    self._open()
  File "C:\Python39\lib\site-packages\PIL\PngImagePlugin.py", line 732, in _open
    raise SyntaxError(msg)
SyntaxError: not a PNG file

For this solution, I would have to edit the dataframe-image package files themselves correct? If so, do you recommend editing them directly or forking the repo? Please excuse my ignorance.

joshuabrownenz commented 1 year ago

I got it working by not having the temporary image file open while the chrome subprocess is ran. Below is monkey patch which works for me but I don't know why the file was opened. If Chrome update was the reason for it breaking maybe older version didn't work without it.

import dataframe_image as dfi
from dataframe_image._screenshot import Screenshot

import subprocess
import io
from pathlib import Path
from tempfile import TemporaryDirectory
from matplotlib import image as mimage

def take_screenshot_override(self):
    temp_dir = TemporaryDirectory()
    temp_html = Path(temp_dir.name) / "temp.html"
    temp_img = Path(temp_dir.name) / "temp.png"
    with open(temp_html, "w", encoding="utf-8") as f:
        f.write(self.html)

    args = [
        "--enable-logging",
        "--disable-gpu",
        "--headless",
        "--no-sandbox",
        "--crash-dumps-dir=/tmp",
        f"--force-device-scale-factor={self.device_scale_factor}",
    ]

    if self.ss_width and self.ss_height:
        args.append(f"--window-size={self.ss_width},{self.ss_height}")

    args += [
        "--hide-scrollbars",
        f"--screenshot={str(temp_img)}",
        str(temp_html),
    ]

    subprocess.run(executable=self.chrome_path, args=args)
    with open(temp_img, "rb") as f:
        img_bytes = f.read()

    buffer = io.BytesIO(img_bytes)
    img = mimage.imread(buffer)
    return self.possibly_enlarge(img)

Screenshot.take_screenshot = take_screenshot_override

This worked for me too.

This worked for me as well. However I did have to comment out f"--force-device-scale-factor={self.device_scale_factor}",.

joshuabrownenz commented 1 year ago

So it seems i have to edit the dataframe-image package itself with these solutions? Please excuse my ignorance.

Same issue for me. I thought it was because I was using a Virtual Machine, but maybe it's because the Chrome version on my VM is newer. I was able to workaround by replacing _screenshot.py Line 115 : with open(temp_img, "wb") as f: with this replacement line: with open(temp_html, "rb") as f: Or delete Line 115 altogether, and de-indent the block below it. It doesn't makes sense to me to open temp_img for writing, when it's being written by a subprocess (Chrome). It seems like doing that blocked Chrome from writing to temp_img now. Details of bug: Windows 10 VM on VirtualBox, running on Windows 10 bare metal OS. VirtualBox 7.0.4 Chrome 109.0.5414.75 python 3.9.6 dataframe-image 0.1.3

>>> import pandas as pd
>>> import dataframe_image as dfi
>>> df = pd.DataFrame({'x':[1,2,3]})
>>> dfi.export(df, 'out.png')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 48, in export
    return _export(
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 117, in _export
    img_str = converter(html)
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 194, in run
    img = self.take_screenshot()
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 146, in take_screenshot
    img = mimage.imread(buffer)
  File "C:\Python39\lib\site-packages\matplotlib\image.py", line 1541, in imread
    with img_open(fname) as image:
  File "C:\Python39\lib\site-packages\PIL\ImageFile.py", line 117, in __init__
    self._open()
  File "C:\Python39\lib\site-packages\PIL\PngImagePlugin.py", line 732, in _open
    raise SyntaxError(msg)
SyntaxError: not a PNG file

For this solution, I would have to edit the dataframe-image package files themselves correct? If so, do you recommend editing them directly or forking the repo? Please excuse my ignorance.

For this solution you do need to edit the _screenshot.py script which is part of the dataframe_image. If you want to avoid doing that (I did). I used the monkey patch posted by @iraaz4321 which you just need to run before you call dfi.export. Meaning it can be placed in a user create .py file.

Same issue for me. I thought it was because I was using a Virtual Machine, but maybe it's because the Chrome version on my VM is newer.

I was able to workaround by replacing _screenshot.py Line 115 : with open(temp_img, "wb") as f:

with this replacement line: with open(temp_html, "rb") as f:

Or delete Line 115 altogether, and de-indent the block below it. It doesn't makes sense to me to open temp_img for writing, when it's being written by a subprocess (Chrome). It seems like doing that blocked Chrome from writing to temp_img now.

@buckyster do you think this is worth opening a PR for?

Zoe54445 commented 1 year ago

Same issue for me. I thought it was because I was using a Virtual Machine, but maybe it's because the Chrome version on my VM is newer.

I was able to workaround by replacing _screenshot.py Line 115 : with open(temp_img, "wb") as f:

with this replacement line: with open(temp_html, "rb") as f:

Or delete Line 115 altogether, and de-indent the block below it. It doesn't makes sense to me to open temp_img for writing, when it's being written by a subprocess (Chrome). It seems like doing that blocked Chrome from writing to temp_img now.

Details of bug:

Windows 10 VM on VirtualBox, running on Windows 10 bare metal OS. VirtualBox 7.0.4 Chrome 109.0.5414.75 python 3.9.6 dataframe-image 0.1.3

>>> import pandas as pd
>>> import dataframe_image as dfi
>>> df = pd.DataFrame({'x':[1,2,3]})
>>> dfi.export(df, 'out.png')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 48, in export
    return _export(
  File "C:\Python39\lib\site-packages\dataframe_image\_pandas_accessor.py", line 117, in _export
    img_str = converter(html)
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 194, in run
    img = self.take_screenshot()
  File "C:\Python39\lib\site-packages\dataframe_image\_screenshot.py", line 146, in take_screenshot
    img = mimage.imread(buffer)
  File "C:\Python39\lib\site-packages\matplotlib\image.py", line 1541, in imread
    with img_open(fname) as image:
  File "C:\Python39\lib\site-packages\PIL\ImageFile.py", line 117, in __init__
    self._open()
  File "C:\Python39\lib\site-packages\PIL\PngImagePlugin.py", line 732, in _open
    raise SyntaxError(msg)
SyntaxError: not a PNG file

Thanks! This worked well.

ItamarShalev commented 1 year ago

I had the same problem, I found the root cause and fixed it. @PaleNeutron can you merge my pr ?

Fixed https://github.com/dexplo/dataframe_image/pull/70

Edit: Now I see @Zoe54445 already understands that, anyway can you create a new release version with that fix? It will help a lot thanks @PaleNeutron _

buckyster commented 1 year ago

For this solution, I would have to edit the dataframe-image package files themselves correct? If so, do you recommend editing them directly or forking the repo? Please excuse my ignorance.

@Mumbo-Jumbo-3 , either way works, whatever you prefer. If you're just doing development on a single machine, you can just directly edit the dataframe-image package file. If you're deploying to multiple machines, you probably want a fork that you can reference.

ItamarShalev has submitted a pull request as well. So hopefully that can be merged and released soon.

PaleNeutron commented 1 year ago

Thanks to every effort you make in this issue to find out the bug.

I have create a new release , enjoy it.

Madanraj-Delta commented 1 year ago

If columns is more than 8, getting this error.

Reproduced with simple dataframe with 10 columns.

@PaleNeutron

PaleNeutron commented 1 year ago

@Madanraj-Delta , Should be fixed in 0.2.0.