One-Language / One

One (onelang) is an open-source system programming language that makes it easy to build reliable, efficient and performant software. (release as soon) 1️⃣ 🕐 🩱
https://onelang.org
GNU General Public License v3.0
284 stars 60 forks source link

🦸 [Need Help] Operator-precedence parser 🦸 #71

Open BaseMax opened 3 years ago

BaseMax commented 3 years ago

Hi, I took the initial phase of the parser forward and it works well and detects tokens.

Now I need help arranging operations and operators. Does anyone want to help? (Operator-precedence parser) for e.g:

expression ::= equality-expression
equality-expression ::= additive-expression ( ( '==' | '!=' ) additive-expression ) *
additive-expression ::= multiplicative-expression ( ( '+' | '-' ) multiplicative-expression ) *
multiplicative-expression ::= primary ( ( '*' | '/' ) primary ) *
primary ::= '(' expression ')' | NUMBER | VARIABLE | '-' primary
ghost commented 3 years ago

here is cpp's operator precedence https://en.cppreference.com/w/cpp/language/operator_precedence im pretty sure its common sense using bidmas i can help if need be.

BaseMax commented 3 years ago

LexAndYacc.pdf OperatorPrecedenceParsing.pdf unknown

BaseMax commented 3 years ago

Thank you for your message.

Okay, honestly I not sure about some special operators. such as >> << and etc. I aware Operator-precedence is different in every language. So I not sure how we can go up. I think we can figure out how PHP works by thinking about its Operator-precedence parser.

BaseMax commented 3 years ago

I checked its parser. They did not divide the phrase into several groups. And he manages these using BISON. https://github.com/php/php-src/blob/master/Zend/zend_language_parser.y#L1041 Since we are writing Parser by hand, we need to group and know more details.

BaseMax commented 3 years ago

Dear @snapogod; https://github.com/One-Language/One/issues/71#issuecomment-854995262

Can you help us complete the BNF grammar? https://github.com/One-Language/One/blob/master/grammar.BNF We need to know grammar of all operators.

BaseMax commented 3 years ago

Another nice implementation of the expression parser. But it does not help us much. https://github.com/maxicombina/c-calculator/blob/master/calculator.c#L170

int get_op_precedence(char op)
{
    int res = 0;
    if (op == '+' || op == '-')
    {
        res = 9;
    }
    else if (op == '*' || op == '/' || op == '%')
    {
        res = 10;
    }
    else if (op == '^')
    {
        res = 11;
    }
    return res;
}

static int is_higher_or_equal_precedence(char op1, char op2)
{
    return get_op_precedence(op1) &gt;= get_op_precedence(op2);
}

static int is_higher_precedence(char op1, char op2)
{
    return get_op_precedence(op1) &gt; get_op_precedence(op2);
}

static int is_left_assoc_operator(char op)
{
    return (op == '+' || op == '-' || op == '*' || op == '/' || op == '%');
}

static int is_right_assoc_operator(char op)
{
    return (op == '^');
}
BaseMax commented 3 years ago

Some useful references I find yesterday: https://depositonce.tu-berlin.de/bitstream/11303/2630/1/Dokument_46.pdf https://cdn2.hubspot.net/hubfs/3881487/ExprParser_User_Guide.pdf?t=1517563829446

BaseMax commented 3 years ago

Another nice implementation of the expression parser: https://github.com/programoworm/Expression_Recogniser/blob/master/expression.c

This is something I extracted from the above code but it is still incomplete and has a problem:

parseFactor = "+" parseFactor # need Fix
                    = "-" parseFactor # need Fix
                    = "(" INT ")"
                    = INT

parseTerm = parseFactor
                  = parseFactor "*" parseFactor
                  = parseFactor "/" parseFactor

parseExpressions = parseTerm
                              = parseTerm "+" parseTerm
                              = parseTerm "-" parseTerm
BaseMax commented 3 years ago

I found some online tools that can test grammar. This is very interesting and it is better to test the grammars in the same way.

More about this subject:

But I'm looking for something that can draw a picture and show it with a figure ... Do you know anything?

BaseMax commented 3 years ago

Another nice example from PEG.JS:

// Simple Arithmetics Grammar
// ==========================
//
// Accepts expressions like "2 * (3 + 4)" and computes their value.

Expression
  = head:Term tail:(_ ("+" / "-") _ Term)* {
      return tail.reduce(function(result, element) {
        if (element[1] === "+") { return result + element[3]; }
        if (element[1] === "-") { return result - element[3]; }
      }, head);
    }

Term
  = head:Factor tail:(_ ("*" / "/") _ Factor)* {
      return tail.reduce(function(result, element) {
        if (element[1] === "*") { return result * element[3]; }
        if (element[1] === "/") { return result / element[3]; }
      }, head);
    }

Factor
  = "(" _ expr:Expression _ ")" { return expr; }
  / Integer

Integer "integer"
  = _ [0-9]+ { return parseInt(text(), 10); }

_ "whitespace"
  = [ \t\n\r]*

https://pegjs.org/online

BaseMax commented 3 years ago

The above codes are summarized as follows:

Expression   = Term ( ("+" / "-")  Term)*
                         | Term

Term  = Factor ( ("*" / "/") Factor)*

Factor   = "(" Expression ")"
               | [0-9]+

The problem is that this grammar is not complete and does not include dozens of operators!

BaseMax commented 3 years ago

A tiny YACC/Bison example:

exp: exp OPERATOR_PLUS exp
   | exp OPERATOR_MINUS exp
   | exp OPERATOR_MULT exp
   | exp OPERATOR_DIV exp
   | LPAREN exp RPAREN
   | NUMBER

https://www.cs.uic.edu/~spopuri/ibison.html

BaseMax commented 3 years ago

Okay, so again we need "Precedence of operators in BNF grammar" with all operators not only + - * /.

BaseMax commented 3 years ago

And finally, I find BNF of c language: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm

And also another nice documentation for BNF: https://www.radford.edu/~nokie/classes/380/syntax.html

BaseMax commented 3 years ago
=============== INPUT ===============
package main
i8 test1 {
    _ 4
    _ 5+1
}
=============== TREE ===============
Package: main

[FUNC] test1
   [STMT] PRINT
      [EXPRESSIONS] 1 expression(s)
         [EXPR] VALUE 4
   [STMT] PRINT
      [EXPRESSIONS] 1 expression(s)
         [EXPR] + 
            Left:
               [EXPR] VALUE 5
            Right:
               [EXPR] VALUE 1
=============== VM ===============
Package: main
[FUNC] test1
[STMT] PRINT
[STMT] PRINT

Currently + - operators supported. But we need to add many more operators.

A question, The input is _ 5+1-2+6 generate this TREE:

   [STMT] PRINT
      [EXPRESSIONS] 1 expression(s)
         [EXPR] + 
            Left:
               [EXPR] VALUE 5
            Right:
               [EXPR] - 
                  Left:
                     [EXPR] VALUE 1
                  Right:
                     [EXPR] + 
                        Left:
                           [EXPR] VALUE 2
                        Right:
                           [EXPR] VALUE 6

It's correct? can someone check this?

BaseMax commented 3 years ago

A message from Dear snappyagggodypingable:

a = b
a += b
a -= b
a *= b
a /= b
a %= b
a &= b
a |= b
a ^= b
a <<= b
a >>= b

++a
--a
a++
a--

+a
-a
a + b
a - b
a * b
a / b
a % b
~a
a & b
a | b
a ^ b
a << b
a >> b

!a
a && b
a || b

a == b
a != b
a < b
a > b
a <= b
a >= b
a <=> b

a[b]
a
&a
a->b
a.b
a->b
a.*b

all those right?

BaseMax commented 3 years ago

ruby:

<=> == === != =~ !~
= %= { /= -= += |= &= >>= <<= *= &&= ||= **=

And again message from Dear snappyagggodypingable:

<expression>   ::= <term> "+" <expression>
                 | <term> "-" <expression>
                 | <term>

<term>         ::= <factor> "*" <term>
                 | <factor> "/" <term>
                 | <factor>
<power>        ::= <factor> ^ <power>
                 | <factor>

<factor>       ::= "(" <expression> ")"
                 | <const>

<const>        ::= NUMBER | STRING | IDENTIFIER
BaseMax commented 3 years ago

Hooray, It's mostly done and now works. :+1: I did used C operators table: https://en.cppreference.com/w/c/language/operator_precedence

BaseMax commented 3 years ago

*A = (2 + 15 ( 25 - 13 ) / 1 - 4) + 10 / 2**

AST TREE of The above mathematic expression:

 [EXPR] + 
    Left:
       [EXPR] + 
          Left:
             [EXPR] VALUE 2
          Right:
             [EXPR] * 
                Left:
                   [EXPR] VALUE 15
                Right:
                   [EXPR] / 
                      Left:
                         [EXPR] - 
                            Left:
                               [EXPR] VALUE 25
                            Right:
                               [EXPR] VALUE 13

                      Right:
                         [EXPR] - 
                            Left:
                               [EXPR] VALUE 1
                            Right:
                               [EXPR] VALUE 4

    Right:
       [EXPR] / 
          Left:
             [EXPR] VALUE 10
          Right:
             [EXPR] VALUE 2

I'm not sure it's correct or no. It's wrong. but what is correct? so.... A = (2 + 15 (( 25 - 13 ) / 1) - 4) + 10 / 2 A = (2 + (15 ( 25 - 13 ) / 1) - 4) + 10 / 2 or ?

BaseMax commented 3 years ago

Correct answer is:

= (2 + ((15 * (25 - 13)) / 1) - 4) + (10 / 2)
BaseMax commented 3 years ago

Below is the full implementation of EvaluateExpression function and its helpers in C#.

int EvaluateExpression(char[] exp)
{

    Stack<int> vStack = new Stack<int>();
    Stack<char> opStack = new Stack<char>();

    opStack.Push('('); // Implicit opening parenthesis

    int pos = 0;
    while (pos <= exp.Length)
    {
        if (pos == exp.Length || exp[pos] == ')')
        {
            ProcessClosingParenthesis(vStack, opStack);
            pos++;
        }
        else if (exp[pos] >= '0' && exp[pos] <= '9')
        {
            pos = ProcessInputNumber(exp, pos, vStack);
        }
        else
        {
            ProcessInputOperator(exp[pos], vStack, opStack);
            pos++;
        }

    }

    return vStack.Pop(); // Result remains on values stacks

}

void ProcessClosingParenthesis(Stack<int> vStack,
                                Stack<char> opStack)
{

    while (opStack.Peek() != '(')
        ExecuteOperation(vStack, opStack);

    opStack.Pop(); // Remove the opening parenthesis

}

int ProcessInputNumber(char[] exp, int pos,
                        Stack<int> vStack)
{

    int value = 0;
    while (pos < exp.Length &&
            exp[pos] >= '0' && exp[pos] <= '9')
        value = 10 * value + (int)(exp[pos++] - '0');

    vStack.Push(value);

    return pos;

}

void ProcessInputOperator(char op, Stack<int> vStack,
                            Stack<char> opStack)
{

    while (opStack.Count > 0 &&
            OperatorCausesEvaluation(op, opStack.Peek()))
        ExecuteOperation(vStack, opStack);

    opStack.Push(op);

}

bool OperatorCausesEvaluation(char op, char prevOp)
{

    bool evaluate = false;

    switch (op)
    {
        case '+':
        case '-':
            evaluate = (prevOp != '(');
            break;
        case '*':
        case '':
            evaluate = (prevOp == '*' || prevOp == '');
            break;
        case ')':
            evaluate = true;
            break;
    }

    return evaluate;

}

void ExecuteOperation(Stack<int> vStack,
                        Stack<char> opStack)
{

    int rightOperand = vStack.Pop();
    int leftOperand = vStack.Pop();
    char op = opStack.Pop();

    int result = 0;
    switch (op)
    {
        case '+':
            result = leftOperand + rightOperand;
            break;
        case '-':
            result = leftOperand - rightOperand;
            break;
        case '*':
            result = leftOperand * rightOperand;
            break;
        case '':
            result = leftOperand  rightOperand;
            break;
    }

    vStack.Push(result);

}
BaseMax commented 3 years ago

Correct answer/tree is: photo_2021-06-09_01-35-12

Thanks from https://github.com/carlos-chaguendo/arboles-binarios

jbampton commented 3 years ago

Great work @BaseMax !! 🥇

BaseMax commented 3 years ago

*_ (2 + 15 ( 25 - 13 ) / 1 - 4) + 10 / 2**

Tree Output of the expression:

└──Statement: PRINT
    └──Expressions (1)
        └──Expression +
            └──Expression Left
                └──Expression -
                    └──Expression Left
                        └──Expression +
                            └──Expression Left
                                └──Expression Direct: 2
                            └──Expression Right
                                └──Expression *
                                    └──Expression Left
                                        └──Expression Direct: 15
                                    └──Expression Right
                                        └──Expression /
                                            └──Expression Left
                                                └──Expression -
                                                    └──Expression Left
                                                        └──Expression Direct: 25
                                                    └──Expression Right
                                                        └──Expression Direct: 13
                                            └──Expression Right
                                                └──Expression Direct: 1
                    └──Expression Right
                        └──Expression Direct: 4
            └──Expression Right
                └──Expression /
                    └──Expression Left
                        └──Expression Direct: 10
                    └──Expression Right
                        └──Expression Direct: 2

I not sure it's correct or no.

BaseMax commented 3 years ago

That was wrong. I fixed bug https://github.com/One-Language/One/commit/4c073531a2d5e8800b3f83beb9d143e2487a7691.

and finally it's AST output:

            └──Statement: PRINT
                └──Expressions (1)
                    └──Expression +
                        └──Expression Left
                            └──Expression -
                                └──Expression Left
                                    └──Expression +
                                        └──Expression Left
                                            └──Expression Direct: 2
                                        └──Expression Right
                                            └──Expression /
                                                └──Expression Left
                                                    └──Expression *
                                                        └──Expression Left
                                                            └──Expression Direct: 15
                                                        └──Expression Right
                                                            └──Expression -
                                                                └──Expression Left
                                                                    └──Expression Direct: 25
                                                                └──Expression Right
                                                                    └──Expression Direct: 13
                                                └──Expression Right
                                                    └──Expression Direct: 1
                                └──Expression Right
                                    └──Expression Direct: 4
                        └──Expression Right
                            └──Expression /
                                └──Expression Left
                                    └──Expression Direct: 10
                                └──Expression Right
                                    └──Expression Direct: 2
BaseMax commented 3 years ago

Sorry, I don't have much time to translate this to English:

سلام شب همگی بخیر؛ سوالی داشتم لطفا راهنمایی ام کنید. اینکه یک اپراتور چپ به راست هست یا راست به چپ چه مفهومی داره؟ مثلا جمع و تفریق بنظر چپ به راست هست. 5 + 4 5 - 4 ضرب و تقسیم هم همینطور چپ به راست هست. 5 * 4 اما مساوی راست به چپ هست. a = 5+4 حتی کوچک تر / بزرگتر/ کوچکتر مساوی / بزرگتر مساوی هم چپ به راست هست. 5 > 4 5 < 4 یکسری مثال خوب و شفاف پیدا کردم میفرستم شاید بدرد شما هم بخوره. سعی میکنم تحلیل کنم نتیجه برسم Example: 5 - 4 - 3 (5 - 4) - 3 = -2 // left association is correct 5 - (4 - 3) = 4 // right is incorrect Example: x = y = z x = (y = z) // right association is correct, sets x and y (x = y) = z // left is incorrect, does not set y این دقیقا بحث operator associativity هست که نمیدونم فارسی اش چیه: https://en.wikipedia.org/wiki/Operator_associativity سلام. درسته. مثلا این عبارت: 100 / 10 * 10 برابر هست با: (100/10) * 10 = 10 * 10 چون بنظر عملگر ضرب و تقسیم هر دو چپ به راست هستند. اما وقتی عبارت پیچیده میشه گیج میشم نمیفهمم چی به چی باید بشه از کجا شروع کنم. : 5 + 100 / 10 * 10 = 10/2 * 5 - 1 اول اینکه جمع و تفریق چپ به راست هست. ضرب تقسیم هم چپ به راست هست. ولی تساوی راست به چپ هست. علت اش رو نمیدونم ولی با این فرض سعی میکنم ادامه دهم: 105 / 10 * 10 = 10/2 * 5 -1 در مرحله بعدی سعی میکنم تقسیم سمت چپ رو حل کنم. اصلا چرا باید از سمت چپ شروع کنیم؟ چرا از سمت راست شروع نمیکنیم؟ مگر ما اینجا تساوی هم نداریم تساوی از راست به چپ هست. ولی بنظر ما باید طبق قانونی که معلوم نیست چیه از چپ شروع به تحلیل کنیم. 10.5 * 10 = 10/2 * 5 -1 و حالا اگر دوباره ادامه دهیم و ضرب سمت چپ رو هم حساب کنیم: 105 = 10/2 * 5 -1 حالا به اینجا رسیدیم که سمت چپ تساوی یک جواب نهایی داریم. و حالا که به تساوی برخورد میکنیم. میرویم که طرف دیگه رو تحلیل کنیم: 105 = 5 * 5 -1 و حالا ضرب سمت راست رو تحلیل کنیم: 105 = 25 -1 و حالا تفریق سمت راست: 105 = 24 و این شد جواب اخر. درسته این دو تا برابر نیستند لزومی هم نبوده برابر باشند. صرفا یه عبارت ریاضی هست که ممکنه درست باشه یا نه. توی این تحلیل من اصلا هیچ موردی توی ذهنم رو برای عملگر مساوی رعایت نکردم چون عملگر مساوی راست به چپ هست اما تغییری برام ایجاد نکرد یا حداقل بلد نیستم چه تغییری ایجاد کرده که نفهمیدم.

These thoughts were in my mind.

BaseMax commented 3 years ago
یه چیزی یاد گرفتم گفتم اینجا هم بگم شاید برای برخی هم سوال باشه. چپ به راست بودن یا راست به چپ بودن عملگر تنها در شرایطی مهم میشه که حداقل دو تا عملگر هم جنس یا تکراری داشته باشیم. در شرایطی که یک عملگر باشه هیچ فرقی نمیکنه که از چپ شروع بشه یا از راست. عبارت ریاضی:

x = y = z

دو تحلیل مختلف که میشود داشت: یکی از راست:

x = (y = z)

یکی از چپ:

(x = y) = z

اینجا چون حداقل دو تا عملگر = داشتیم این مهم شد که کدوم تحلیل درسته. اما اگه فقط یدونه = میداشتیم اصلا این نیاز حس نمیشد. اها. الان بیشتر متوجه میشوم. مثلا اگه همچین عبارتی داشته باشیم:

!+-X

این رو چطور تحلیل کنیم؟ اینجا بیشتر از یک اپراتور از یک جنس داریم. طبیعتا دو نوع تحلیل میشه داشت. تحلیل از راست به چپ:

! ( + ( -(X) ) )

تحلیل از چپ به راست: نمیدونم چطوری میشه. گیج شدم باز. خب مشخص هست که هنوز نفهمیدم چون گیر میکنم. الان چه دلیلی داره که میگویند عملگر ! که قبل از چیزی میاد راست به چپ هست. و چرا چپ به راست نیست؟ نمیدونم حالتی که چپ به راست میشه رو چطور و چی باید نوشت
BaseMax commented 3 years ago
========= نیم ساعت بعد ========= سعی کردم یه عبارت ریاضی سخت تر و پیچیده بسازم:

5 * !+-X / 2

خب روی همین عبارت گیر کردم. سوالی که برای ادم پیش میاد این هست که الان منفی ایکس رو تقسیم بر دو بکنم و بعد ادامه بده یا اینکه نه همه رو حساب بکنه اخر سر تقسیم بر دو بکنه. با اینکه گیج میزنم سعی میکنم دامه بدهم ببینم به چی و چه مرحله ای میرسم:

5 * !(+(-(x))) / 2

این یکی از حالت ها هست. حالت دیگه هم احتمالا این بشه:

5 * !(+(-(x / 2) ) )

jbampton commented 2 years ago

Hey @BaseMax what is the status of this issue ?

BaseMax commented 2 years ago

It's fixed in new version, I have to push new version and focus on the core.

jbampton commented 2 years ago

Do you have an update on this ? Is core ready in the new branch ?

BaseMax commented 2 years ago

A new update: almost every operator work well in the new version. We are on a roll! Happy ^_^