PAHFIT / pahfit

Model Decomposition for Near- to Mid-Infrared Spectroscopy of Astronomical Sources
https://pahfit.readthedocs.io/
18 stars 26 forks source link

Obscure bug in Features table formatter #273

Closed drvdputt closed 4 months ago

drvdputt commented 8 months ago

In Features, the TableFormatter class is overridden by a new class definition: BoundedParTableFormatter from features_format.py. It overrides the _pformat_table function, to pretty-print the 3-tuples representing (value, min, max) in each BoundedMaskedColumn of the table.

It does this by setting col.info.format to a custom formatting function, which assumes that it will receive a 3-tuple. This happens in this code block


    colsh = [(col, col.shape) for col in table.columns.values()]
            BoundedMaskedColumn._omit_shape = True
            for col, sh in colsh:
                if len(sh) == 2 and sh[1] == 3:
                    bpcols.append((col, col.info.format))
                    col.info.format = fmt_func(col.info.format or "g")

after this, the super()._pformat_table is invoked to apply these formatters.

I'm still not sure how to reproduce this, but printing the features table throws an exception sometimes. For some reason the formatting function _fmt receives a scalar value instead of a 3-tuple. This happens during the assignment col.info.format = .... Astropy Column uses a custom assigment function for this property, that tries to format a test value. I think there is some sort of automatic unpacking going on in that test function, that results in an element of the tuple being passed, instead of the intended actual 3-tuple in the column. I have no solution for this, except guarding for this in our _fmt function.

Here's the series of exceptions

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:90, in get_auto_format_func.<locals>._auto_format_func(format_, val)
     89 try:
---> 90     out = format_func(format_, val)
     91     if not isinstance(out, str):

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:88, in get_auto_format_func.<locals>._auto_format_func.<locals>.<lambda>(format_, val)
     87 if callable(format_):
---> 88     format_func = lambda format_, val: format_(val)
     89     try:

File ~/Repositories/pahfit/pahfit/features/features_format.py:16, in fmt_func.<locals>._fmt(v)
     15 print(f"problem in _fmt: v = {v}, fmt = {fmt}, type(fmt) = {type(fmt)}")
---> 16 raise e

File ~/Repositories/pahfit/pahfit/features/features_format.py:9, in fmt_func.<locals>._fmt(v)
      8 try:
----> 9     if ma.is_masked(v[0]):
     10         return "  masked  "

IndexError: invalid index to scalar variable.

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:535, in TableFormatter._pformat_col_iter(self, col, max_lines, show_name, show_unit, outs, show_dtype, show_length)
    534 try:
--> 535     yield format_col_str(idx)
    536 except ValueError:

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:523, in TableFormatter._pformat_col_iter.<locals>.format_col_str(idx)
    522 else:
--> 523     left = format_func(col_format, col[(idx,) + multidim0])
    524     right = format_func(col_format, col[(idx,) + multidim1])

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:103, in get_auto_format_func.<locals>._auto_format_func(format_, val)
    101         return str(val)
--> 103     raise ValueError(f"Format function for value {val} failed.") from err
    104 # If the user-supplied function handles formatting masked elements, use
    105 # it directly.  Otherwise, wrap it in a function that traps them.

ValueError: Format function for value 5.27 failed.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/column.py:782, in BaseColumn.format(self, format_string)
    780 try:
    781     # test whether it formats without error exemplarily
--> 782     self.pformat(max_lines=1)
    783 except Exception as err:
    784     # revert to restore previous format if there was one

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/column.py:886, in BaseColumn.pformat(self, max_lines, show_name, show_unit, show_dtype, html)
    885 _pformat_col = self._formatter._pformat_col
--> 886 lines, outs = _pformat_col(
    887     self,
    888     max_lines,
    889     show_name=show_name,
    890     show_unit=show_unit,
    891     show_dtype=show_dtype,
    892     html=html,
    893 )
    894 return lines

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:294, in TableFormatter._pformat_col(self, col, max_lines, show_name, show_unit, show_dtype, show_length, html, align)
    292 # Replace tab and newline with text representations so they display nicely.
    293 # Newline in particular is a problem in a multicolumn table.
--> 294 col_strs = [
    295     val.replace("\t", "\\t").replace("\n", "\\n") for val in col_strs_iter
    296 ]
    297 if len(col_strs) > 0:

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:294, in <listcomp>(.0)
    292 # Replace tab and newline with text representations so they display nicely.
    293 # Newline in particular is a problem in a multicolumn table.
--> 294 col_strs = [
    295     val.replace("\t", "\\t").replace("\n", "\\n") for val in col_strs_iter
    296 ]
    297 if len(col_strs) > 0:

File ~/Projects/horsehead+ncg7023/analyze-spectra/git/venv/lib/python3.11/site-packages/astropy/table/pprint.py:537, in TableFormatter._pformat_col_iter(self, col, max_lines, show_name, show_unit, outs, show_dtype, show_length)
    536         except ValueError:
--> 537             raise ValueError(
    538                 'Unable to parse format string "{}" for entry "{}" '
    539                 'in column "{}"'.format(col_format, col[idx], col.info.name)
    540             )
    542 outs["show_length"] = show_length

ValueError: Unable to parse format string "<function fmt_func.<locals>._fmt at 0x1699d79c0>" for entry "[5.27 -- --]" in column "wavelength"
drvdputt commented 8 months ago

I realize now that this is probably the same bug as #266, but in a different context

jdtsmith commented 5 months ago

One thing I had to do to get the format function here to work correctly was to lie (temporarily) about the underlying shape of the column data. That is because a table format function is used on the "smallest unit", which for multi-dim and structured array data in a column, is the individual value. So you see I turn on a class variable _omit_shape while table formatting, which omits the final dimension of the shape (e.g. 100,3 will just have shape 100). Then it turns _omit_shape off. It would appear that this dirty hack is not universally successful, so that individual values are getting through. Your tracebacks show the value [5.27, masked, masked] originally, but the formatter is just getting 5.27.

jdtsmith commented 5 months ago

How about this radical idea: we give up entirely on using masking for bounds as our underlying representation, and instead, use structured arrays for each variable (val, min, max), and use np.nan to represent "no bound" (just as in the YAML!), with the interpretation of . We can still then use masking to mask out the entire entry (needed for when it's "inapplicable"). This eliminates pretty much all this printing hackery, and should "just work" since you can access structured arrays by index or name. The representation on disk will be a bit different, but no matter.

Our pretty printing is then just a simple column format function, assigned to all bounded param columns, or we can use the same trick we did before to override format and substitute this function for the real format (so users can throw '%8.2g' or whatever in there).

def _fmt(x):
    ret = f"{x['val']:6.3f}"
    if np.isnan(x['min']) and np.isnan(x['max']):
        return ret + " (fixed)"
    else:
        mn = "-∞" if np.isnan(x['min']) else f"{x['min']:6.3f}"
        mx = "∞" if np.isnan(x['max']) else f"{x['max']:6.3f}"
        return f"{ret} ({mn}, {mx})"

So:

t=Table()
PAR_DTYPE = np.dtype([("val", "f"), ("min", "f"), ("max", "f")])
a = np.ma.array([(np.pi, 2, 4.5), (np.pi / 2, 1, np.nan), (4, np.nan, np.nan)], dtype=PAR_DTYPE, mask=[1, 0, 0])
t['power'] = a
t['power'].info.format = _fmt
print(t)

produces

power [val, min, max]
---------------------
                   --
    1.571 ( 1.000, ∞)
        4.000 (fixed)

Then we must remember to temporarily strip these functions from info.format(unless we go the override route).

jdtsmith commented 5 months ago

See #283 for the idea (untested). We only need to provide the dtype on construction of the table, and can simplify all our hackery. See if that fixes your issue?

jdtsmith commented 4 months ago

283 merged, closing.