astanin / python-tabulate

Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
https://pypi.org/project/tabulate/
MIT License
2.1k stars 163 forks source link

[bug] Handling NoneTypes while using the maxcolwidths feature #271

Open f0sh opened 1 year ago

f0sh commented 1 year ago

Using a dataset, which contains a field with a None type and a defined maxcolwidths for that field - like the following:

headers= ['id', 'name', 'description']
width = [None, None, 5]
data = [[123456, 'Test 1', 'Testdescription'], [654321, 'Test 2', None]]

leads to TypeError: NoneType takes no arguments error.

Traceback

Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\project\package\script.py", line 21, in <module>
    main()
  File "C:\Users\user\project\package\script.py", line 18, in main
    cli()
  File "C:\Users\user\project\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\user\project\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\user\project\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\user\project\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\user\project\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\user\project\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\user\project\lib\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "C:\Users\user\project\package\commands_model.py", line 181, in list_elements
    click.echo(tabulate(data, headers=headers, tablefmt="simple_grid", maxcolwidths=width))
  File "C:\Users\user\project\lib\site-packages\tabulate\__init__.py", line 2061, in tabulate
    list_of_lists = _wrap_text_to_colwidths(
  File "C:\Users\user\project\lib\site-packages\tabulate\__init__.py", line 1516, in _wrap_text_to_colwidths
    str(cell) if _isnumber(cell) else _type(cell, numparse)(cell)
TypeError: NoneType takes no arguments

Temporary Solution

You manually have to check the dataset and replace None Types with empty strings before passing it to tabulate. Would be better if tabulate could handle NoneTypes by itself.

AlecVivian commented 1 year ago

A solution that is working for my edge case is to modify the function _wrap_text_to_colwidths to take in optional kwarg missingval, and use that value in a situation where decorator _type can't operate on a cell's value: https://github.com/astanin/python-tabulate/blob/master/tabulate/__init__.py#L1532

def _wrap_text_to_colwidths(list_of_lists, colwidths, missingval='__ERR_VAL__', numparses=True): # add `missingval` as optional param
    numparses = _expand_iterable(numparses, len(list_of_lists[0]), True)

    result = []

    for row in list_of_lists:
        new_row = []
        for cell, width, numparse in zip(row, colwidths, numparses):
            if _isnumber(cell) and numparse:
                new_row.append(cell)
                continue

            if width is not None:
                wrapper = _CustomTextWrap(width=width)
                # Cast based on our internal type handling
                # Any future custom formatting of types (such as datetimes)
                # may need to be more explicit than just `str` of the object
                casted_cell = (
                    str(cell) if _isnumber(cell) else _type(cell, numparse)(cell) if cell is not None else missingval # circumvent applying the operation on NoneType.
                )
                wrapped = wrapper.wrap(casted_cell)
                new_row.append("\n".join(wrapped))
            else:
                new_row.append(cell)
        result.append(new_row)

    return result

Then in the main tabulate function (https://github.com/astanin/python-tabulate/blob/master/tabulate/__init__.py#L2093-L2120 ):

def tabulate(...):
    ...
    if maxcolwidths is not None:
        if len(list_of_lists):
            num_cols = len(list_of_lists[0])
        else:
            num_cols = 0
        if isinstance(maxcolwidths, int):  # Expand scalar for all columns
            maxcolwidths = _expand_iterable(maxcolwidths, num_cols, maxcolwidths)
        else:  # Ignore col width for any 'trailing' columns
            maxcolwidths = _expand_iterable(maxcolwidths, num_cols, None)

        numparses = _expand_numparse(disable_numparse, num_cols)
        list_of_lists = _wrap_text_to_colwidths(
            list_of_lists, maxcolwidths, missingval, numparses=numparses # Utilize missingval as substitute for None values.
        )

    if maxheadercolwidths is not None:
        num_cols = len(list_of_lists[0])
        if isinstance(maxheadercolwidths, int):  # Expand scalar for all columns
            maxheadercolwidths = _expand_iterable(
                maxheadercolwidths, num_cols, maxheadercolwidths
            )
        else:  # Ignore col width for any 'trailing' columns
            maxheadercolwidths = _expand_iterable(maxheadercolwidths, num_cols, None)

        numparses = _expand_numparse(disable_numparse, num_cols)
        headers = _wrap_text_to_colwidths(
            [headers], maxheadercolwidths, numparses=numparses # Assume default value specified in private function sig.
        )[0]
    ...

Here is a functioning monkeypatch I have going for version 0.9.0 (I know it's not pretty/questionable, but it unblocks an ask I have):

import tabulate
def mp_wrap_text_to_colwidths(list_of_lists, colwidths, missingval='', numparses=True):
    numparses = tabulate._expand_iterable(numparses, len(list_of_lists[0]), True)

    result = []

    for row in list_of_lists:
        new_row = []
        for cell, width, numparse in zip(row, colwidths, numparses):
            if tabulate._isnumber(cell) and numparse:
                new_row.append(cell)
                continue

            if width is not None:
                wrapper = tabulate._CustomTextWrap(width=width)
                # Cast based on our internal type handling
                # Any future custom formatting of types (such as datetimes)
                # may need to be more explicit than just `str` of the object
                casted_cell = (
                    str(cell) if tabulate._isnumber(cell) else tabulate._type(cell, numparse)(cell) if cell is not None else missingval
                )
                wrapped = wrapper.wrap(casted_cell)
                new_row.append("\n".join(wrapped))
            else:
                new_row.append(cell)
        result.append(new_row)

    return result
tabulate._wrap_text_to_colwidths = mp_wrap_text_to_colwidths

def mp_tabulate(
    tabular_data,
    headers=(),
    tablefmt="simple",
    floatfmt=tabulate._DEFAULT_FLOATFMT,
    intfmt=tabulate._DEFAULT_INTFMT,
    numalign=tabulate._DEFAULT_ALIGN,
    stralign=tabulate._DEFAULT_ALIGN,
    missingval=tabulate._DEFAULT_MISSINGVAL,
    showindex="default",
    disable_numparse=False,
    colalign=None,
    maxcolwidths=None,
    rowalign=None,
    maxheadercolwidths=None,
):
    """Format a fixed width table for pretty printing.
    """

    if tabular_data is None:
        tabular_data = []

    list_of_lists, headers = tabulate._normalize_tabular_data(
        tabular_data, headers, showindex=showindex
    )
    list_of_lists, separating_lines = tabulate._remove_separating_lines(list_of_lists)

    if maxcolwidths is not None:
        num_cols = len(list_of_lists[0])
        if isinstance(maxcolwidths, int):  # Expand scalar for all columns
            maxcolwidths = tabulate._expand_iterable(maxcolwidths, num_cols, maxcolwidths)
        else:  # Ignore col width for any 'trailing' columns
            maxcolwidths = tabulate._expand_iterable(maxcolwidths, num_cols, None)

        numparses = tabulate._expand_numparse(disable_numparse, num_cols)
        list_of_lists = tabulate._wrap_text_to_colwidths(
            list_of_lists, maxcolwidths, missingval, numparses=numparses
        )

    if maxheadercolwidths is not None:
        num_cols = len(list_of_lists[0])
        if isinstance(maxheadercolwidths, int):  # Expand scalar for all columns
            maxheadercolwidths = tabulate._expand_iterable(
                maxheadercolwidths, num_cols, maxheadercolwidths
            )
        else:  # Ignore col width for any 'trailing' columns
            maxheadercolwidths = tabulate._expand_iterable(maxheadercolwidths, num_cols, None)

        numparses = tabulate._expand_numparse(disable_numparse, num_cols)
        headers = tabulate._wrap_text_to_colwidths(
            [headers], maxheadercolwidths, numparses=numparses
        )[0]

    # empty values in the first column of RST tables should be escaped (issue #82)
    # "" should be escaped as "\\ " or ".."
    if tablefmt == "rst":
        list_of_lists, headers = tabulate._rst_escape_first_column(list_of_lists, headers)

    # PrettyTable formatting does not use any extra padding.
    # Numbers are not parsed and are treated the same as strings for alignment.
    # Check if pretty is the format being used and override the defaults so it
    # does not impact other formats.
    min_padding = tabulate.MIN_PADDING
    if tablefmt == "pretty":
        min_padding = 0
        disable_numparse = True
        numalign = "center" if numalign == tabulate._DEFAULT_ALIGN else numalign
        stralign = "center" if stralign == tabulate._DEFAULT_ALIGN else stralign
    else:
        numalign = "decimal" if numalign == tabulate._DEFAULT_ALIGN else numalign
        stralign = "left" if stralign == tabulate._DEFAULT_ALIGN else stralign

    # optimization: look for ANSI control codes once,
    # enable smart width functions only if a control code is found
    #
    # convert the headers and rows into a single, tab-delimited string ensuring
    # that any bytestrings are decoded safely (i.e. errors ignored)
    plain_text = "\t".join(
        tabulate.chain(
            # headers
            map(tabulate._to_str, headers),
            # rows: chain the rows together into a single iterable after mapping
            # the bytestring conversino to each cell value
            tabulate.chain.from_iterable(map(tabulate._to_str, row) for row in list_of_lists),
        )
    )

    has_invisible = tabulate._ansi_codes.search(plain_text) is not None

    enable_widechars = tabulate.wcwidth is not None and tabulate.WIDE_CHARS_MODE
    if (
        not isinstance(tablefmt, tabulate.TableFormat)
        and tablefmt in tabulate.multiline_formats
        and tabulate._is_multiline(plain_text)
    ):
        tablefmt = tabulate.multiline_formats.get(tablefmt, tablefmt)
        is_multiline = True
    else:
        is_multiline = False
    width_fn = tabulate._choose_width_fn(has_invisible, enable_widechars, is_multiline)

    # format rows and columns, convert numeric values to strings
    cols = list(tabulate.izip_longest(*list_of_lists))
    numparses = tabulate._expand_numparse(disable_numparse, len(cols))
    coltypes = [tabulate._column_type(col, numparse=np) for col, np in zip(cols, numparses)]
    if isinstance(floatfmt, str):  # old version
        float_formats = len(cols) * [
            floatfmt
        ]  # just duplicate the string to use in each column
    else:  # if floatfmt is list, tuple etc we have one per column
        float_formats = list(floatfmt)
        if len(float_formats) < len(cols):
            float_formats.extend((len(cols) - len(float_formats)) * [tabulate._DEFAULT_FLOATFMT])
    if isinstance(intfmt, str):  # old version
        int_formats = len(cols) * [
            intfmt
        ]  # just duplicate the string to use in each column
    else:  # if intfmt is list, tuple etc we have one per column
        int_formats = list(intfmt)
        if len(int_formats) < len(cols):
            int_formats.extend((len(cols) - len(int_formats)) * [tabulate._DEFAULT_INTFMT])
    if isinstance(missingval, str):
        missing_vals = len(cols) * [missingval]
    else:
        missing_vals = list(missingval)
        if len(missing_vals) < len(cols):
            missing_vals.extend((len(cols) - len(missing_vals)) * [tabulate._DEFAULT_MISSINGVAL])
    cols = [
        [tabulate._format(v, ct, fl_fmt, int_fmt, miss_v, has_invisible) for v in c]
        for c, ct, fl_fmt, int_fmt, miss_v in zip(
            cols, coltypes, float_formats, int_formats, missing_vals
        )
    ]

    # align columns
    aligns = [numalign if ct in [int, float] else stralign for ct in coltypes]
    if colalign is not None:
        assert isinstance(colalign, tabulate.Iterable)
        for idx, align in enumerate(colalign):
            aligns[idx] = align
    minwidths = (
        [width_fn(h) + min_padding for h in headers] if headers else [0] * len(cols)
    )
    cols = [
        tabulate._align_column(c, a, minw, has_invisible, enable_widechars, is_multiline)
        for c, a, minw in zip(cols, aligns, minwidths)
    ]

    if headers:
        # align headers and add headers
        t_cols = cols or [[""]] * len(headers)
        t_aligns = aligns or [stralign] * len(headers)
        minwidths = [
            max(minw, max(width_fn(cl) for cl in c))
            for minw, c in zip(minwidths, t_cols)
        ]
        headers = [
            tabulate._align_header(h, a, minw, width_fn(h), is_multiline, width_fn)
            for h, a, minw in zip(headers, t_aligns, minwidths)
        ]
        rows = list(zip(*cols))
    else:
        minwidths = [max(width_fn(cl) for cl in c) for c in cols]
        rows = list(zip(*cols))

    if not isinstance(tablefmt, tabulate.TableFormat):
        tablefmt = tabulate._table_formats.get(tablefmt, tabulate._table_formats["simple"])

    ra_default = rowalign if isinstance(rowalign, str) else None
    rowaligns = tabulate._expand_iterable(rowalign, len(rows), ra_default)
    tabulate._reinsert_separating_lines(rows, separating_lines)

    return tabulate._format_table(
        tablefmt, headers, rows, minwidths, aligns, is_multiline, rowaligns=rowaligns
    )

# Overwrite the `tabulate` function in the library:
tabulate.tabulate = mp_tabulate
wabiloo commented 1 year ago

I'm having a similar issue. In my case the input data (dict, not array) doesn't have specific None, but don't have all the keys. I guess those are treated as implicit None as a result. I will try the monkeypatch, but a core solution would be great, thanks!

mp-gh commented 7 months ago

I'm running into this as well: from tabulate import tabulate config_table_data = list() config_table_data.append(("a","value")) config_table_data.append(("b",None)) print(tabulate(config_table_data))


a value b


print(tabulate(config_table_data,maxcolwidths=[20, 30]))

Traceback (most recent call last): File "", line 1, in File "/home/mp/.cache/virtualenvs/hga6bo9e-py3.10/lib/python3.10/site-packages/tabulate/init.py", line 2061, in tabulate list_of_lists = _wrap_text_to_colwidths( File "/home/mp/.cache/virtualenvs/hga6bo9e-py3.10/lib/python3.10/site-packages/tabulate/init.py", line 1516, in _wrap_text_to_colwidths str(cell) if _isnumber(cell) else _type(cell, numparse)(cell) TypeError: NoneType takes no arguments