Excel-DNA / IntelliSense

Add in-sheet IntelliSense for Excel UDFs
MIT License
170 stars 52 forks source link

Improve formula parsing #12

Closed govert closed 7 years ago

govert commented 8 years ago

At the moment the formula parsing is rudimentary, and will display the wrong IntelliSense tip for any kind of nested function call, or composite formula.

The formula parsing needs to be improved to determine the correct current function and current argument position.

govert commented 8 years ago

Issue #15 concerns this too.

govert commented 8 years ago

What is required is a better implementation here: https://github.com/Excel-DNA/IntelliSense/blob/master/Source/ExcelDna.IntelliSense/UIMonitor/FormulaParser.cs

        // TODO: This needs a proper implementation, considering subformulae
        // Works out the current function name and argument position for the given formula prefix.
        // E.g.    =MyFunc(1, 
        //             should set the functionName to “MyFunc” and the currentArgIndex to 1 (0-based)
        // Returns true if there is an open function in the formula
        internal static bool TryGetFormulaInfo(string formulaPrefix, out string functionName, out int currentArgIndex)
        {
            var match = Regex.Match(formulaPrefix, @"^=(?<functionName>[\w|.]*)\(");
            if (match.Success)
            {
                functionName = match.Groups["functionName"].Value;
                currentArgIndex = formulaPrefix.Count(c => c == ',');
                return true;
            }
            functionName = null;
            currentArgIndex = -1;
            return false;
        }
Ron-Ldn commented 8 years ago

Please note that the argument separator is not always the comma, it depends on the culture. For instance, with a French version of Excel it is the semicolon which is used. Ex: "=SUM(1.1,2.2)" in English culture will be "=SOMME(1,1;2,2)" in French culture. Do you know a way to get the separator used for the current culture?

Ron-Ldn commented 8 years ago

Apparently we might get the correct separator with "application.International[XlApplicationInternational.xlListSeparator]".

govert commented 8 years ago

The C API under GET.WORKSPACE gives us Option 37, item 3 : the decimal separator, item 5 : the list separator Option 16 : Alternate array item separator to use if the current array separator is the same as the decimal separator.

We'd need to make the calls to the CAPI or COM to retrieve this setting at the right time during initialization - certainly not when parsing the formula.

Apparently we might get the correct separator with "application.International[XlApplicationInternational.xlListSeparator]"

Ron-Ldn commented 8 years ago

I don't know the C api. It would probably be faster than my implementation. I have added on my local version (not pushed to git) the following function into the FormulaParser class which is called by IntelliSenseHelper constructor.

private static char _listDelimiter; public static void FindDelimiter() { dynamic app = ExcelDnaUtil.Application; _listDelimiter = app.International[5][0]; }

At least it works fine for an English culture, but I don't have a French one at my disposal. "app.International[5]" returns a string. I assumed it cannot be more than 1 char.

Ron-Ldn commented 8 years ago

I looked into the formula parsing. It's a bit of a headache since there are so many use-cases. I came with a solution which I find simple:

  1. Replace all the text between double-quotes with empty string.
  2. Replace all the calls between parentheses with empty string 3.a. If there is no opening parenthesis left, return false. 3.b. Split the formula into 2 parts, separated by the last opening parenthesis. Use the regex on the left part to find the actual UDF. Count the number of separators on the right part.

I would prefer a solution without all the string replacements in order to save performance. But when I run it, I don't feel any latency.

Let me know what you think of it.

    internal static bool TryGetFormulaInfo(string formulaPrefix, out string functionName, out int currentArgIndex)
    {
        formulaPrefix = Regex.Replace(formulaPrefix, "(\"[^\"]*\")|(\\([^\\(\\)]*\\))| ", string.Empty);

        while (Regex.IsMatch(formulaPrefix, "\\([^\\(\\)]*\\)"))
        {
            formulaPrefix = Regex.Replace(formulaPrefix, "\\([^\\(\\)]*\\)", string.Empty);
        }

        int lastOpeningParenthesis = formulaPrefix.LastIndexOf("(", formulaPrefix.Length - 1, StringComparison.Ordinal);

        if (lastOpeningParenthesis > -1)
        {
            var match = Regex.Match(formulaPrefix.Substring(0, lastOpeningParenthesis), @"[^\w](?<functionName>\w*)$");
            if (match.Success)
            {
                functionName = match.Groups["functionName"].Value;
                currentArgIndex = formulaPrefix.Substring(lastOpeningParenthesis, formulaPrefix.Length - lastOpeningParenthesis).Count(c => c == _listDelimiter);
                return true;
            }
        }

        functionName = null;
        currentArgIndex = -1;
        return false;
    }
govert commented 8 years ago

I've pushed something for the list separator - it was also needed for the text in the tooltip.

Thank you for looking at the formula parsing - it's not something I'm planning to look at myself, and anything is likely an improvement on what we have now.

Ron-Ldn commented 8 years ago

Do you want me to push so we can test all the new fixes together?

govert commented 8 years ago

Yes please.

govert commented 8 years ago

I agree the Regex-based plan is probably not the right long-term plan. But at least it doesn't run on the main thread. We might have to test with some long formulas.

Ron-Ldn commented 8 years ago

I have updated the code in order to fix issue #16

mbasset commented 8 years ago

We are also running into this problem when we use a list of strings in excel. Something like:

=OURFUNCTION({"str1","str2"},"2nd spot")

will treat str2 as the 2nd parameter and the 2nd spot as the third parameter. This can be very confusing for our users.

Ron-Ldn commented 8 years ago

@mbasset Thank you for pointing this use-case. I have fixed the FormulaParser class and it should work now. Please feel free to share any corner cases you might think about.

bdkoepke commented 8 years ago

https://github.com/spreadsheetlab/XLParser

govert commented 7 years ago

This seems OK for now.