dgorissen / pycel

A library for compiling excel spreadsheets to python code & visualizing them as a graph
GNU General Public License v3.0
565 stars 147 forks source link

Minor issue regarding whitespace parsing #136

Open igheorghita opened 2 years ago

igheorghita commented 2 years ago

What happened and code sample

This doesn't seem like a big deal but wanted to document it. In a situation where we write a formula having an unnecessary space between the sheet name and range like =SUM('Sheet2'! A1:A5), Excel evaluates it, but pycel gives #NAME?

Checked other variations while I was at it:

from openpyxl import load_workbook
wb = load_workbook()
ws = wb['Sheet1']
ws['A1'] = '=SUM(\'Sheet2\'!A1:A5)'    # no spaces
ws['A2'] = '=SUM(\'Sheet2\'! A1:A5)'  # Excel can compute this, but pycel can't
ws['A3'] = '=SUM (\'Sheet2\'!A1:A5)'  # this is the only one that Excel can't compute
ws['A4'] = '=SUM( \'Sheet2\'!A1:A5)'
ws['A5'] = '=SUM(\'Sheet2\'!A1: A5)'
ws['A6'] = '=SUM(\'Sheet2\'!A1 :A5)'

Here's the wb saved: test_tokenizer.xlsx

I'm not really familiar with the tokenizer, but something like this fixes it

    def _items(self):
        ...
        next_consumed = False
        for prev_token, token, next_token in zip(t, t[1:], t[2:]):
            if next_consumed:
                next_consumed = False
                continue
            if token.type != Token.WSPACE or not prev_token or not next_token:
               ...
            # testing: case when there's a whitespace like 'Sheet'! A1:A5
            elif token.type == Token.WSPACE and (
                prev_token.matches(type_=Token.OPERAND, subtype=Token.RANGE) and
                next_token.matches(type_=Token.OPERAND, subtype=Token.RANGE)):

                tokens.pop()
                tokens.append(Token(prev_token.value + next_token.value, Token.OPERAND, Token.RANGE))
                next_consumed = True
            elif 
               ...
        return tokens

This is probably not the best way and might have unintended consequences though.

stephenrauch commented 2 years ago

Thanks for reporting this. Ideally these things would be addressed in the tokenizer which is in openpyxl. I don't remember the context for the HACK comment I put in there referencing the openpyxl pull request. openpyxl had to move from bitbucket so that documentation is no longer there.

But as far as hacks go, I see nothing immediately wrong with what you are proposing. Do you fancy doing another PR?