aidhound / hotxlfp

A Python Excel Formula Parser similar to the javascript handsontable formulaparser
MIT License
28 stars 12 forks source link

Parsing and evaluating #1

Open prescod opened 5 years ago

prescod commented 5 years ago

Thank you for making a lightweight formula parsing library.

Unfortunately, it looks like the API makes it inappropriate for my project.

This library merges parsing and evaluating into a single step which makes it impossible to cache the results of the parsing step for performance.

Here is how Jinja2 does it:

from jinja2 import Template

x = Template("{{foo*bar}}")
rc = x.render({"foo": 3, "bar": 5})
print(rc)

Python itself has a distinction between parsing and evaluation:

https://docs.python.org/3/library/parser.html#example-emulation-of-compile

I also like that jinja2 allows me to pass a dict or dict sub-class for the variables and values. Copying them from my dict to yours (through set_variable) is another efficiency loss, especially if only a subset of them might be used by the formula.

leonelcamara commented 5 years ago

Thanks for this feedback. I do want to keep the set_function and set_variable API because it's how formulaparser.js which inspired this does it. However, parse could certainly get a dict as an argument so you would not need the extra set_variable function calls.

As for separating parsing and evaluating, right now we don't have any intermediate representation, I'm not sure it's worth it either as we're generally speaking about relatively small formulas (compared to say a python source file or an html template).

You've given me some food for thought. I'll keep this open, while I think about it, to remind me, and so it can inspire possible pull requests in the direction you're asking.

prescod commented 5 years ago

If you do allow parse to pass a dict, I'd suggest you allow it to pass any dict-like so that the caller can implement lazy lookups. For example, maybe config.foo would load (and cache) config.json under the covers and then look for "foo" in it.

With respect to parsing and evaluation: I admit I haven't benchmarked your library because he dict thing was a problem for me, but with another library I did find that re-parsing the same formulas over and over had a pretty big cost. I'm generating millions of rows at a time so I'm trying to optimize everything. Yes, they are small, but my program consists of nothing except running formulas and spitting out the results.

Still, maybe its a niche use-case and not worth special support for. I will note that it is a very common pattern. You can store compiled Python regexps, Jinja templates, etc.