astanin / python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
https://pypi.org/project/tabulate/
MIT License
2.09k stars 162 forks source link

Discussion: data types of the columns #321

Open airvzxf opened 4 months ago

airvzxf commented 4 months ago

Discussion: data types of the columns

Discussion

My question is whether there should be a mixed-type column instead of automatically deciding what data type the column is. In this case, “tabulate” would try to identify the data type for each cell in that column and treat it as such rather than the overall type of the column.

Quick comment

In GitHub, you could add the discussion feature: https://github.com/features/discussions. With this feature, your community can create a discussion, if a specific discussion is relevant, it could move to the issues section.

Not all the final users (community) use the discussion feature, instead the repositories or projects have enabled. But, the discussion feature appears to be useful in terms of administration.

Context

I noticed that “tabulate” reviews all the rows for each column and automatically assigns a type of column. It is fabulous, but I am concerned when the rows in the column are mixed.

In the mixed cases, I discovered that the order is as follows:

Evidence

All the results were taken, adding debug lines for the function _format(). It prints the val type and the valtype value to compare both.

def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
    print(f'    val: {val}')
    print(f'   type: {type(val)}')
    print(f'valtype: {valtype}')
    print()

For this instruction: tabulate([[82000.38], ["abcd"], [92165]], tablefmt="plain") the valtype is <class 'str'>.

The result is below.

    val: 82000.38
   type: <class 'float'>
valtype: <class 'str'>

    val: abcd
   type: <class 'str'>
valtype: <class 'str'>

    val: 92165
   type: <class 'int'>
valtype: <class 'str'>

For this instruction: tabulate([[12013], [210], [15.24], [92165]], tablefmt="plain") the valtype is <class 'float'>.

The result is below.

    val: 12013
   type: <class 'int'>
valtype: <class 'float'>

    val: 210
   type: <class 'int'>
valtype: <class 'float'>

    val: 15.24
   type: <class 'float'>
valtype: <class 'float'>

    val: 92165
   type: <class 'int'>
valtype: <class 'float'>

For this instruction: tabulate([[12013], [210], [92165]], tablefmt="plain") the valtype is <class 'int'>.

The result is below.

    val: 12013
   type: <class 'int'>
valtype: <class 'int'>

    val: 210
   type: <class 'int'>
valtype: <class 'int'>

    val: 92165
   type: <class 'int'>
valtype: <class 'int'>

Expectation

Based on this discussion, I expected this output for the _format() function.

Solution 1

For this instruction: tabulate([[82000.38], ["abcd"], [92165]], tablefmt="plain") the valtype should be Mixed or something like this.

The result is below.

    val: 82000.38
   type: <class 'float'>
valtype: <class 'Mixed'>

    val: abcd
   type: <class 'str'>
valtype: <class 'Mixed'>

    val: 92165
   type: <class 'int'>
valtype: <class 'Mixed'>

Then, in the logic for the _format() function, we can check that it is mixed and take the real value for the val to perform all the actions for formatting.

Solution 2

Always ignore the valtype and take the type of the val. Except if some parameter was passed which indicates that the user specified the format of the column. Something like this: tabulate([[82000.38], ["abcd"], [92165]], coltypes=(int), tablefmt="plain"); which will treat all the cells in the column as integers.

Final note

I arrived to this package because I was using the Pandas package, specific to the function “to_markdown”. Maybe, could be a good idea to add the Pandas people to see this discussion and have additional feedback.

By the way, Pandas wraps a limited version of tabulate for the function to_markdown. Outside this discussion, it could be nice to Pandas wrap the full parameters and functionality of tabulate.