astanin / python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
https://pypi.org/project/tabulate/
MIT License
2.1k stars 162 forks source link

Add custom object convertor (DataModel or API) #178

Open baterflyrity opened 2 years ago

baterflyrity commented 2 years ago

Hello!

Subject

I like tabulate idea but one thing does not leave me calm. Why should I convert (format) different objects to one of supported data structure by myself each time? Thus I propose adding magic attribute based object convertor via __table__(self) method like __str__ and __repr__.

Explanation

Just to be clear I have prepared two examples. Firstly I show how I use tabulate now. Then I use monkey patch to bring it to life proposed changes.

Regular usage

This example shows regular tabulate usage for custom data.

from dataclasses import dataclass
from tabulate import tabulate

@dataclass
class User:
    id: int
    name: str

users = [User(0, 'Bob'), User(1, 'Ben')]
print(tabulate([[u.id, u.name] for u in users], ['#', 'Client'], tablefmt='fancy_grid')) 

изображение

Proposed usage

Next I want to implement my changes like this:

from dataclasses import dataclass
from tabulate import tabulate

@dataclass
class User:
    id: int
    name: str
    def __table__(self):
        return [self.id, self.name]

class ClientBase:
    def __init__(self, *users: User):
        self._users = users
    def __table__(self):
        return [u.__table__() for u in self._users], ['#', 'Client']

users = ClientBase(User(0, 'Bob'), User(1, 'Ben'))
print(monkey_patched_tabulate(users, tablefmt='fancy_grid'))

This code results in the same table: изображение

Implementation

So I have successfully monkey patched stable tabulate version but I suggest this changes to the community. My current implementation just adds several lines of code in def _normalize_tabular_data(tabular_data, headers, showindex="default") method. Here it is:

def monkey_patched_tabulate(tabular_data, *args, **kwargs):
    # ...
    # somewhere in type checks
    if hasattr(tabular_data, '__table__'):
        # Object with custom __table__ convertor
        rows, headers = tabular_data.__table__()
        if headers is not None:
            kwargs.update(headers=headers)
        return monkey_patched_tabulate(rows, *args, **kwargs)
    # other checks
    else:  # it's a usual an iterable of iterables, or a NumPy array
        # convert rows with custom __table__ convertor
        rows = [row.__table__() if hasattr(row, '__table__') else row for row in tabular_data]
        # ... rest of the code
        return tabulate(rows, *args, **kwargs) # recursive call of _normalize_tabular_data() replaced with my patch method

Benefits

Closing thoughts

All in all I hope my idea will help someone. Unfortunately I do not have enough time now to create PR thus one is welcome to do it)

astanin commented 1 year ago

That's an interesting idea but I'm not convinced it is useful to many users. Let's see if there is enough interest in this issue.

I would also like to know what real problems it is trying to solve. In your example ClientBase could have implemented an iterable with pretty much the same results.

I would almost always prefer to keep user-specific data transformation outside of the library and very explicit. A list comprehension can be done by the end users, and can extract just the data (columns) which need to be displayed. The semantics of the table attribute would have to be defined. I don't like that the "record" and the "set of records" expose the same interface.

baterflyrity commented 1 year ago

@astanin , sure, it was not clear... The main idea is to separate what and how user data is rendered - something like model-view approach.

As for now there is only two ways to render data: provide hard-coded class or iterable of iterables. I propose to implement a ways of defining middleware model→view adapter. Usually it is done via magic methods.

Yes, you are right about interface similarity. Guess it should be something like __table__,__header__ and __row__.

A list comprehension can be done by the end users, and can extract just the data (columns) which need to be displayed.

Agree except tabulate is used only by end users. Imagine I publish my awesome data library grizzlies which datasets support pretty printing and array conversion. How can I distinguish dataset casting to tabulate data and to numpy data? So I will need to use conditional imports, __str__ overloading, e.t.c.

astanin commented 1 year ago

@baterflyrity

Imagine I publish my awesome data library grizzlies which datasets support pretty printing and array conversion. How can I distinguish dataset casting to tabulate data and to numpy data?

So you are the provider of a data source, and have control about how it is defined (can change its class, methods, etc.). You want it to be printable through tabulate and let say some other similar data transformation tool.

What's the value of a hidden special method __table__ over using an explicit adapter? Something like this:

users = ClientBase(...)
print(tabulate(users.as_table(), headers="keys", tablefmt="..."))

It requires exactly the same effort on your part, the method is more discoverable by the end users, and it uses a well tested code path of the library rather than a rarely used option (I expect fewer bugs).

BTW, I just noted that your original request was written before dataclasses support was merged (pull request #125). In 0.9.0 you can now use it like this:

>>> users = [User(0, 'Bob'), User(1, 'Ben')]
>>> print(tabulate(users, tablefmt="grid"))
+---+-----+
| 0 | Bob |
+---+-----+
| 1 | Ben |
+---+-----+

I hope it already saves some typing.

astanin commented 1 year ago

Guess it should be something like table,header and row.

Thinking about it, tabulate already supports this kind of a magic API that data should implement. If you consider that an iterable of dicts is printable, this is what needs to be done:

baterflyrity commented 1 year ago

@astanin ,

What's the value of a hidden special method table over using an explicit adapter?

The same value as for _normalize_tabular_data function existence. As long as you use integrated normalizer for several popular types you can just allow users to provide their own custom normalizer.

If you consider that an iterable of dicts is printable, this is what needs to be done:

  • make the data object iterable
  • define keys and values attributes on its values (records)

Or just provide a convenient interface like __table__ that returns tabulate.DataModel class instance.

In 0.9.0 you can now use it like this...

Very nice, did not notice.

astanin commented 1 year ago

provide a convenient interface like table that returns tabulate.DataModel class instance.

I like the idea of having an abstract tabulate.DataModel.