astanin / python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
https://pypi.org/project/tabulate/
MIT License
2.1k stars 163 forks source link

SEPARATING_LINE doesn't work with tablefmt='orgtbl'. #250

Open gety9 opened 1 year ago

gety9 commented 1 year ago

Works:

from tabulate import tabulate, SEPARATING_LINE
data = [ ['Jack', 'Word 1', 'Word 2'], ['Ronaldinho', 'Word 3', 'Word 4'], SEPARATING_LINE, ['Becky', 8, 9], ['William', 1, 2] ]
print(tabulate(data))

----------  ------  ------
Jack        Word 1  Word 2
Ronaldinho  Word 3  Word 4
----------  ------  ------
Becky       8       9
William     1       2
----------  ------  ------

Does not work:

from tabulate import tabulate, SEPARATING_LINE
data = [ ['Jack', 'Word 1', 'Word 2'], ['Ronaldinho', 'Word 3', 'Word 4'], SEPARATING_LINE, ['Becky', 8, 9], ['William', 1, 2] ]
print(tabulate(data, tablefmt='orgtbl'))

| Jack       | Word 1 | Word 2 |
| Ronaldinho | Word 3 | Word 4 |
|  |
| Becky      | 8      | 9      |
| William    | 1      | 2      |
kkm000 commented 1 year ago

SEPARATING_LINE is borked at least with 'html', 'simple_outline' and 'latex' formats. It is printed literally, as a single-character string '\001', exactly as it's defined in source:

https://github.com/astanin/python-tabulate/blob/83fd4fb98926c8a6fdf45caa1b91ee8913b64dcb/tabulate/__init__.py#L48-L50

The HTML format is messed up even worse: the generated table/tr/td tree structure is malformed.

My general impression is that many "more complex" that plain character formats are broken, although I do not know how the difference in their internal handling, neither did I test many of them.

Interactive from IPython console in Jupyter:

# Jupyter setting to print all non-`None` evaluation results, not only the last one in a cell:
%config ZMQInteractiveShell.ast_node_interactivity='all'

from tabulate import tabulate, SEPARATING_LINE, __version__ as tabver
('tabulate version:', tabver); del tabver
data = [("m1", 0.9), ("m2", 0.999), SEPARATING_LINE, ("acc", 92.24), ("loss", 0.2121)]
print(tabulate(data))  # The default default works.
('tabulate version:', '0.9.0')

----  -------
m1     0.9
m2     0.999
----  -------
acc   92.24
loss   0.2121
----  -------

tablefmt='simple_outline' just outputs "\001" for the SEPARATING_LINE (@gety9, this is a repro of your report):

print(tabulate(data, tablefmt='simple_outline'))
tabulate(data, tablefmt='simple_outline')
┌──────┬─────────┐
│ m1   │  0.9    │
│ m2   │  0.999  │
│  │
│ acc  │ 92.24   │
│ loss │  0.2121 │
└──────┴─────────┘

'┌──────┬─────────┐\n│ m1   │  0.9    │\n│ m2   │  0.999  │\n│ \x01 │\n│ acc  │ 92.24   │\n│ loss │  0.2121 │\n└──────┴─────────┘'
                                                               ^^^^ Oops!

As does tablefmt='latex':

print(tabulate(data, tablefmt='latex'))
tabulate(data, tablefmt='latex')

(I've compressed whitespace runs to a single space each in the second, representation form output; it's on the longish side)

\begin{tabular}{lr}
\hline
 m1   &  0.9    \\
 m2   &  0.999  \\
  \\
^^^^^^--OOPS! Should be just \hline
 acc  & 92.24   \\
 loss &  0.2121 \\
\hline
\end{tabular}

"'\\\\begin{tabular}{lr}\\n\\\\hline\\n m1 & 0.9 \\\\\\\\\\n m2 & 0.999 \\\\\\\\\\n \\x01 \\\\\\\\\\n acc & 92.24 \\\\\\\\\\n loss & 0.2121 \\\\\\\\\\n\\\\hline\\n\\\\end{tabular}'"
                                                                              OOPS!-^^^^^

tablefmt='html' closes table at SEPARATING_LINE, but then continues with a <td> etc. on the next row:

tabulate(data, tablefmt='html')
repr(tabulate(data, tablefmt='html'))

image I'm splitting the repr(...) output string below into multiple lines for readability. Actual output is a ≈700-char-long, unbroken line:

'<table>\n<tbody>\n
<tr><td>m1  </td><td style="text-align: right;"> 0.9   </td></tr>\n
<tr><td>m2  </td><td style="text-align: right;"> 0.999 </td></tr>\n
</tbody>\n</table>\n<tr><td>acc </td><td style="text-align: right;">92.24  </td></tr>\n
^^^^^^^^^^^^^^^^^^^^^^^^--OOPS!
<tr><td>loss</td><td style="text-align: right;"> 0.2121</td></tr>\n</tbody>\n</table>'
                                           An extra <table> coda --^^^^^^^^^^^^^^^^^^ 

Obligatory system and pertinent software versions:

$ uname -srvmo
Linux 6.0.0-0.deb11.6-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.0.12-1~bpo11+1 (2022-12-19) x86_64 GNU/Linux
$ python --version
Python 3.7.12
$ jupyter --version
IPython          : 7.34.0
ipykernel        : 6.16.2
ipywidgets       : 8.0.3
jupyter_client   : 7.4.8
jupyter_core     : 4.12.0
jupyter_server   : 1.23.3
jupyterlab       : 3.4.8
[... irrelevant components snipped ...]
traitlets        : 5.7.1

@astanin, tangentially, there exist a clear pattern, widely used in Python, Java, .NET and other assign-by-reference languages all alike, which I can't help suggesting to imbue the token SEPARATING_LINE with a unique identity, thus avoiding the reliance on the fact that a specific string is “very unlikely to be used” in a table. The main idea is that Python's object type has no extrinsic "value" that can be sensibly compared to the same of another object instance. The only property than an instance of object has is being an instance of object; its only equatable property is its intrinsic identity. Two object instances are always compared by reference: one can only tell if two references point to the same instance or two different ones. IMO, an implementation not relying on the table's content might have been simply

SEPARATING_LINE = object()

Since Python assigns variables by reference, assigning SEPARATING_LINE to a variable, binding it to a function argument, or using in a data structure, as it's supposed to be used, will always refer to the same instance, and there is nothing comparable in this type except the instance's identity. This is easier to show with an example than tell:

o1, o2 = object(), object()
o1 == o2
o1ref = o1
(o1ref == o1, o1ref == o2)
[ id(o1), id(o1ref), id(o2) ]

(I'm trimming middle digits of the unique instance ids, as they are 15 decimal places integers 8-O):

False

(True, False)

[13…648, 13…648, 13…664]

Python, unlike .NET or Java, doesn't distinguish between objects and values (value types are for things that cannot live in the garbage-collected heap, but can e.g. on the stack, or as part of other values/object, like class data fields); everything in Python is an object. This is one of the reasons why it has such a dismal performance and resists all attempts at [pre-]compilation or optimization fiercely. If I weren't aware of its history, I'd assume that one of its main design goals had been to make it as inefficient as possible, however hard an implementer would be trying to. (The secondary goal would be to make it a write-only language, as unreadable as only possible; yet another to create a language that would be even more boring to write than Fortran; all three have been achieved with flying colors). In fact, as we all know, there were simply no overarching design goals at all. :-) Still, this surprised me, when I decided to verify that I'm not making a wrong suggestion, including memory use—although by a singleton instance, unimportant at all; but still, I was in the check-it-all mode, to avoid unintentionally confusing you, had it even been in the slightest:

import sys
n = 42
Pretty(tabulate([
  ('type(¤)',          type(n),          type(SEPARATING_LINE),          type(o1)),
  ('¤.__sizeof__()',   n.__sizeof__(),   SEPARATING_LINE.__sizeof__(),   o1.__sizeof__()),
  ('sys.getsizeof(¤)', sys.getsizeof(n), sys.getsizeof(SEPARATING_LINE), sys.getsizeof(o1)),
  ], headers=['', 'an integer', 'a string', 'an object()']))
                  an integer     a string       an object()
----------------  -------------  -------------  ----------------
type(¤)           <class 'int'>  <class 'str'>  <class 'object'>
¤.__sizeof__()    28             50             16
sys.getsizeof(¤)  28             50             16

Generally, sys.getsizeof(x) >= ¤.__sizeof__(), but they seem always equal in this Python. But how come the type int is 12 bytes larger than its parent class object, I have not even a remote idea. Coming from the C++ land, I'm hardly used to seeing 12-byte integers... Python is an enigmatic language indeed. :-)

martinhansdk commented 1 year ago

The github format is broken as well.