astanin / python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
https://pypi.org/project/tabulate/
MIT License
2.11k stars 163 forks source link

Int64 numbers from Pandas DataFrames interpreted as float and incorrectly printed due to overflow #213

Closed jbencina closed 1 year ago

jbencina commented 1 year ago

Issue

When Tabulate prints a Pandas DataFrame with an int64 field the resulting value is incorrectly shown due to overflow. It appears that Tabulate is interpreting Pandas int64 fields as float and then performing a format() call which fails natively in Python:

format(503498111827123021, '.0f')
'503498111827123008'

Expected Behavior

This error does not happen when passing a 64 bit int as a list directly into Tabulate because it is treating the int64 as an int. I believe the fix here is that Tabulate should recognize that a DataFrame int64 field should also be treated as an integer and not attempt to perform a floating format operation.

Reproduction

Test 64bit int with Pandas DataFrame head() -> Correct

import pandas as pd

df = pd.DataFrame({'colA': [503498111827123021]})
df.dtypes

colA    int64
dtype: object

df.head()
                 colA
0  503498111827123021

Test 64bit int withtabulate() on list data-> Correct

from tabulate import tabulate

table = [[503498111827123021]]
print(tabulate(table))
------------------
503498111827123021
------------------

print(tabulate(table, floatfmt='.0f'))
------------------
503498111827123021
------------------

Test 64 bit float with with tabulate() on float data -> Incorrect

from tabulate import tabulate

table = [[503498111827123021.0]]
print(tabulate(table, floatfmt='.0f'))
------------------
503498111827123008
------------------

Test 64 bit int DataFrame field with various combinations -> Incorrect

from tabulate import tabulate
import pandas as pd

df = pd.DataFrame({'colA': [503498111827123021]})
print(tabulate(df, floatfmt='.0f'))
-  ------------------
0  503498111827123008
-  ------------------

print(tabulate(df))
# Without arguments this is being seen as float
-  -----------
0  5.03498e+17
-  -----------

print(df.to_markdown(floatfmt='.0f'))
|    |               colA |
|---:|-------------------:|
|  0 | 503498111827123008 |

print(df.to_markdown())
|    |        colA |
|---:|------------:|
|  0 | 5.03498e+17 |
jbencina commented 1 year ago

Ah I just ran this example using the latest source code and it doesn't appear to happen. There must of been a change since the 0.9.0 release which addressed this

astanin commented 1 year ago

@jbencina Can you please specify what OS, Python and Pandas version you're using? Also tabulate.__version__. I'm surprised it doesn't happen anymore in 0.9.0.

I suppose this is a duplicate of #18. The issue is fixed in master, but not in v0.9.0. Did you install the library from source or via PyPI?

jbencina commented 1 year ago

@astanin I think we're saying the same thing. I did have the issue in 0.9.0 but not in the latest source from GitHub. I downloaded the latter after filing the issue because I was going to debug it and realized it was already fixed 🙂

astanin commented 1 year ago

I was going to debug it and realized it was already fixed 🙂

Good 😃

jbencina commented 1 year ago

@astanin Do you have an approximate date for the next release? Will be bumping Pandas to use the new one when ready

astanin commented 1 year ago

@jbencina No specific date for the next release yet. There are quite some bugs to fix https://github.com/astanin/python-tabulate/milestones/v0.9.1 I'll try to remember to update this thread when it's done.