dgorissen / pycel

A library for compiling excel spreadsheets to python code & visualizing them as a graph
GNU General Public License v3.0
565 stars 147 forks source link

Order of evaluation matters when a cell contains a nonimplemented function #128

Closed igheorghita closed 2 years ago

igheorghita commented 3 years ago

What actually happened

I have a spreadsheet which contains a SWITCH function (not implemented) in certain cells (say A1) and in some lower row there's a cell containing a string (say A3). If I initiate an ExcelCompiler class, then immediately evaluate the string cell, everything works properly. But if I try to evaluate the SWITCH cell first, then try to evaluate the string cell, there's an error. At this point the string cell is not in the cell_map but range_todos is nonempty, so the chain of functions that's called is: _evaluate_non_iterative --> _gen_graph --> _process_gen_graph --> _evaluate_range, which ultimately fails because of the SWITCH related ranges being in range_todos. Funnily, the second time you try to evaluate the string cell after this failure, it works, because at that point the cell is already in the cell_map and _gen_graph never gets called.

What was expected to happen

I would expect that the order in which you evaluate cells shouldn't matter. I wonder if range_todos should be cleared if there's an error in evaluation?

Problem description

If range_todos isn't cleared after there was an error in evaluating a cell, then it seems every time you try to evaluate a cell not in cell_map from that point onwards, you'll encounter an error.

Code Sample

Here's the example Excel file:

switch_test.xlsx

from pycel import ExcelCompiler

# 1. This works properly
filename = 'switch_test.xlsx'
model = ExcelCompiler(filename)
model.evaluate('A3') 

# outputs: 'string'

# 2. This doesn't work as expected
model = ExcelCompiler(filename)
model.evaluate('A1')   # gives an error as expected

# then immediately after try
model.evaluate('A3')   # this also gives an error now

# If you now try to evaluate A3 again, it works, because it's already in model.cell_map
model.evaluate('A3')

# outputs: 'string'

Here's the Traceback from the second example for the A3 cell:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelformula.py in eval_func(excel_formula, cse_array_address)
    922                     ret_val = in_array_formula_context.fit_to_range(
--> 923                         excel_formula.compiled_lambda())
    924 

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelformula.py in <lambda>()
    943 
--> 944         ### Traceback will show this line if not loaded from a text file
    945 

NameError: name 'switch' is not defined

During handling of the above exception, another exception occurred:

UnknownFunction                           Traceback (most recent call last)
<ipython-input-153-9d8b23c5ae54> in <module>
----> 1 model.evaluate('A3')

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelcompiler.py in _evaluate_non_iterative(self, address)
    852 
    853             if address.address not in self.cell_map:
--> 854                 self._gen_graph(address)
    855 
    856         result = self._evaluate(str(address))

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelcompiler.py in _gen_graph(self, seed, recursed)
    921         if not recursed:
    922             # if not entered to process one cell / cellrange process other work
--> 923             self._process_gen_graph()
    924 
    925     def _process_gen_graph(self):

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelcompiler.py in _process_gen_graph(self)
    940         # calc the values for ranges
    941         for range_todo in reversed(self.range_todos):
--> 942             self._evaluate_range(range_todo)
    943         self.range_todos = []
    944 

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelcompiler.py in _evaluate_range(self, address)
    774             else:
    775                 # CSE Array Formula
--> 776                 data = self.eval(cell_range, cell_range.address)
    777             self.log.info(f"Range {cell_range.address} evaluated to '{data}'")
    778 

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelcompiler.py in _eval(cell, cse_array_address)
    166             else:
    167                 def _eval(cell, cse_array_address=None):
--> 168                     return eval_ctx(
    169                         cell.formula, cse_array_address=cse_array_address)
    170 

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelformula.py in eval_func(excel_formula, cse_array_address)
    924 
    925             except NameError:
--> 926                 error_logger('error', excel_formula.python_code,
    927                              msg=excel_formula.msg, exc=UnknownFunction)
    928 

~/git/cityci/venv/lib/python3.8/site-packages/pycel/excelformula.py in error_logger(level, python_code, msg, exc)
    875             getattr(logger, level)(error_msg)
    876             if exc is not None:
--> 877                 raise exc(error_msg)
    878             return error_msg
    879 

UnknownFunction: Traceback (most recent call last):
  File "/Users/iuliagheorghita/git/cityci/venv/lib/python3.8/site-packages/pycel/excelformula.py", line 923, in eval_func
    excel_formula.compiled_lambda())
  File "/Users/iuliagheorghita/git/cityci/venv/lib/python3.8/site-packages/pycel/excelformula.py", line 944, in <lambda>
    ### Traceback will show this line if not loaded from a text file
NameError: name 'switch' is not defined
Eval: switch(_R_("Sheet1!A2:C2"), False, 0, True, 1)
Function SWITCH is not implemented. SWITCH is in the "Logical" group, and was introduced in Excel 2016

Environment

Pycel version: 1.0b27 Python version: 3.8.11 OS: macOS Big Sur (version 11.5.1)

stephenrauch commented 3 years ago

Thanks for the report. I have never used Pycel interactively, so error state conditions that will cause problems in an interactive mode have, to my knowledge, never been addressed.

It has been my experience, that mechanisms to allow continued robust operation after error conditions are highly non-trivial to get right. I would be nervous about trying to "fix" the above with manipulations inside the internal state.

However, a mechanism to note the previous error and to issue a warning on subsequent evaluation attempts, should be easy enough to implement in a robust manner. Would you me interested in submitting a PR?

igheorghita commented 3 years ago

Thanks for the quick response! You probably understood what I wrote, but just to clarify - would you consider the following interactive?

model = ExcelCompiler(filename)
cell_names = ['A1', 'B1', 'C1', 'A2', 'B2', 'C2', 'A3']
for cell_name in cell_names:
    try:
        model.evaluate(cell_name)
    except:
        print('error evaluating cell', cell_name)

# This prints:
# error evaluating cell A1
# error evaluating cell B1
# error evaluating cell C1
# error evaluating cell A3

I imagine it's common to have functions in spreadsheets that aren't implemented and it could be confusing behavior to have certain cells fail after an error happens. I understand your point though. I can also look into it some more and if the warning is the best way to go, I can try to implement that.

As a side note, couple questions: 1) Would you recommend a way of evaluating that doesn't have this order dependence issue? 2) I'm particularly interested in recomputing all cells immediately and I saw there's a recalculate method for the ExcelCompiler class. But this only works if the cells are already in the cell_map. Is there a best way to get the cells in the cell_map besides just evaluating or using some of the internal methods?

Thanks again!

stephenrauch commented 3 years ago

By interactive I meant using Jupyter, or another shell, to work interactively with objects. This is as opposed to restarting a program for each run against the workbook in question. The use case for missing functions that cause errors during evaluation, in which afterwards, continued calculation was expected to always behave reasonably, was, as far as I know, never considered as a design goal.

Missing functions are certainly a thing when bringing in new workbooks to use, but in my workflows, that is immediately followed by implementing the missing function, and then running the tests again. Which is to say, any results that follow an error are always suspect, and never expected to be useful beyond finding that first error.

You can implement a function, and tell pycel where to find it with the plugin parameter, and thus remove the unknown function error, and allow continued calcs with a less unstable calculator state.

igheorghita commented 3 years ago

Got it, thanks for clarifying!