dexplo / dataframe_image

A python package for embedding pandas DataFrames as images into pdf and markdown documents
https://dexplo.org/dataframe_image
MIT License
282 stars 41 forks source link

Quality of Pandas DataFrame Image Export #45

Closed neal-bamford closed 2 years ago

neal-bamford commented 2 years ago

I'm currently in the project stage of my Master's dissertation and I need a way to turn a Pandas DataFrame table into an image which is to be included into a Word Document.

I am using the API to export the data frame but the converted image resolution is quite low. Is there a way of changing this please? Apologies if it's a silly question or an easy one to answer, but I can't for the life of me figure it out

I've included a couple of images as examples.

Thanks in advance

XX_crime_top_crimes_display_table_London_Islington_ . XX_earnings_ranking_display_table_London_Croydon_

hns258 commented 2 years ago

@PaleNeutron Attempting to handle this one and a few other similar issues. It looks like matplotlib has a dpi param but chrome only has a force_device_scale_factor switch. Do you happen to know of any other way to pass in a "dpi" value to chrome or do you think having two optional parameters is ok? (Added to docs that each one only applies to their corresponding table_conversion method similar to the chrome_path parameter)

PaleNeutron commented 2 years ago

@hns258 , Thanks for your contribution! A nice work!

How about make device_scale_factor and savefig_dpi one parameter in top level dfi.export api? For example, enlarge. Which simply means "export a larger picture" and users won't care about how this function do this.

hns258 commented 2 years ago

@PaleNeutron Thanks for getting back to me and happy to help! Got it, I will try to consolidate them into one parameter. However, I feel like we should allow some granularity as to how much larger/"higher quality" the export should be. Especially since this would affect the file size and some users may be restricted in that sense, so may need to find a balance between file size and image quality? I also just realized after looking deeper, it seems like the scale factor can be determined even if a user inputs a dpi with the typical range of values (i.e., if a user passes in a dpi of 400 and the table_conversion=chrome, we can just pass 4 as the device_scale_factor). I'm a bit new to all this though so feel free to correct me if I'm missing something.

PaleNeutron commented 2 years ago

@hns258 The major problem is, picture actual size is in unit pixel, but matplotlib use inches and dpi to calculate pixel. So I am not really clear with the same settings (matplotlib dpi: 100.0 vs chrome device_scale_factor: 1), which picture is larger.

I found device_scale_factor=(1 if dpi == None else dpi/100.0) in current version, which means dpi = device_scale_factor * 100.

You could test it and give a build-in factor.

hns258 commented 2 years ago

@PaleNeutron Yes, without previously working with images and being new to this project, I was concerned about the different settings providing different scalings. Not sure if I'm still misunderstanding something but when I tested out a few different DPI and DSF values (prior to consolidation) with this lib, the 100 scale factor seemed to work, hence the addition of that line of code.

Unfortunately, as of now, when using the same image and no extra args, different table conversions provided different sized exports. So, I couldn't do a true "apples to apples" comparison between the two conversion types but when changing only the DPI for the matplotlib exported images, the dimensions changed by a factor of dpi/100, so it looks like we can actually use 100 to convert between the two values for chrome? In other words, passing in 400 to the matplotlib conversion export produces an image where the dimensions have increased by a factor of 4. (Similarly with 600 -> 6, and 50 -> decreased by 1/2). So passing in 400 for chrome, then converting to a device_scale_factor of 4 seems like it should work the same way? If you have time, please feel free to look at an excel spreadsheet I checked in showing what I found and let me know if I'm overlooking anything. Thanks for your time and help!

PaleNeutron commented 2 years ago

@hns258 , I think current version is good enough!

Since matplotlib and pandas render table totally different, we can not make the result "apples to apples", but "almost the same size" is enough.

Thanks again for your awesome work!

hns258 commented 2 years ago

No problem at all, thanks so much for checking!