RedNeath / ExcelFormulaCalculationEngine

A pure C library that works as an interface to compute Excel formulas, given a data context. It is a recreation of Excel's language calculation engine, with an extra level of abstraction to separate it from the sheet context; allowing lightweight integration within smaller systems.
MIT License
1 stars 0 forks source link

Formula parser #2

Closed RedNeath closed 2 months ago

RedNeath commented 4 months ago

The formula parser is the piece of code that will rip apart the different tokens of a given formula, and create the appropriate calculation tree.

Its job will be divided in 2 parts detailed below.

1 - Token recognition

Based on the lists of all operators, all functions and the input context (specifically the identifiers of the input context), the formula will be separated as tokens, using pattern recognition.

In that part, if the token recognition doesn't succeed, that means the given formula is incorrect, and an error must therefore be thrown at the user.

2 - Priority definition

Then, with the help of the priority level of each operator and function, the calculation tree will be put up in memory, and each node will correspond to one of the previously parsed tokens.

NOTE:
The calculation tree's root should be the operator with the lowest priority, as the processor will use a DFS algorithm, and therefore come to the root at the very last moment.

This part should not fail in any case, as it doesn't depend on the user input.

RedNeath commented 3 months ago

1 - Token recognition

Here is a language definition that should allow validation and token splitting of a formula:

formula       -> '=' expression
expression    -> ({operator} operand {operator} | spechar operand spechar) {expression}
operand       -> expression | variable | function | value
function      -> function_name '(' function_args ')'
function_args -> expression {',' function_args}
function_name -> 'SUM' | 'RANGE' | 'ABS' | 'AVG' ...
operator      -> '+' | '-' | '=' | '%' | ':' | '&' ...
spechar       -> '(' | ')' | ''' | '[' | ']' ...
value         -> NUMBER | STRING | DATE ...
variable      -> STRING (defined in the context)

Descending recursion should be used when parsing the formula, in order to handle that language.

RedNeath commented 3 months ago

Language should follow this grammar:

grammar ExcelFormulaTest;

formula: '=' expression;

expression:
    variable
    | value
    | function
    | '(' expression ')'
    | '-' expression
    | expression '%'
    | expression '^' expression
    | expression ('*' | '/') expression
    | expression ('+' | '-') expression
    | expression '&' expression
    | expression comparison expression
    ;

comparison:
    '='
    | '<'
    | '>'
    | '<='
    | '>='
    | '<>'
    ;

function:
    function_name '(' expression (',' expression)* ')'
    ;

variable:
    'A1'
    | 'B1'
    | 'C1'
    | 'A2'
    | 'A3'
    ;

value:
    NUMBER
    | BOOLEAN
    | STRING
    ;

function_name:
    'ABS'
    | 'AVG'
    | 'IF'
    ;

NAME: [a-zA-Z0-9_]+;
STRING: '"' ('""' | '\r' | '\n' | '\r\n' | '_' | ~'"')*? '"';
NUMBER: [0-9]+ ('.' [0-9]+)?;
BOOLEAN: 'TRUE' | 'FALSE';

WS: [ \t\r\n]+ -> skip;
NEWLINE: [\r\n]+ -> skip;