SebastianMestre / Jasper

A programming language. Static types. Full type inference. Garbage collection. --- not stable
GNU Lesser General Public License v2.1
60 stars 4 forks source link

Fuzzing implementation that doesn't require libraries #187

Open EmmanuelMess opened 4 years ago

EmmanuelMess commented 4 years ago

I propose a function that generates syntactically correct strings, to test the interpreter:

enum Simbol {
    SIGMA, DECLARATION, VAR_NAME, NUMBER,
    assign, end,
    letterA, letterB,
    number1, number2
};

const std::set<Simbol> terminalSimbols {
    assign, end, letterA, letterB, number1, number2
};

const std::unordered_map<Simbol, std::string> finalReplacement {
    {assign,  " := " },
    {end,     ";" },
    {letterA, "a"},
    {letterB, "b"},
    {number1, "1"},
    {number2, "2"},
};

const std::unordered_map<Simbol, std::vector<std::vector<Simbol>>> intermidiateReplacement {
    {SIGMA,          { { DECLARATION } } },
    {DECLARATION,   { { end }, { VAR_NAME, assign, NUMBER, end }, { DECLARATION, DECLARATION } } },
    {VAR_NAME,      { { letterA }, { letterB }, { VAR_NAME, VAR_NAME } } },
    {NUMBER,        { { number1} , { number2 }, { NUMBER, NUMBER } } }
};

const std::unordered_map<Simbol,  std::discrete_distribution<int>> intermidiateReplacementDist {
    {SIGMA,          { 1 } },
    {DECLARATION,   { 0.5, 1, 0.5} },
    {VAR_NAME,       { 1, 1, 1 } },
    {NUMBER,        { 1, 1, 0.5 } }
};

std::list<Simbol> generate() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::list<Simbol> list;
    list.push_back(SIGMA);

    int terminalsInList = 0;

    auto i = list.begin();
    while (terminalsInList < list.size()) {
        if(i == list.begin()) {
            terminalsInList = 0;
        }

        bool terminal = terminalSimbols.find(*i) != terminalSimbols.end();

        if (!terminal) {
            auto replacementDistribution = intermidiateReplacementDist.at(*i);
            std::vector<std::vector<Simbol>> currentReplacementPossibilities = intermidiateReplacement.at(*i);
            assert(currentReplacementPossibilities.size() == replacementDistribution.probabilities().size());

            auto replacement = currentReplacementPossibilities[replacementDistribution(rd)];

            list.insert(i, replacement.begin(), replacement.end());
            i = list.erase(i);
        } else {
            terminalsInList++;
            ++i;
        }

        if(i == list.end()) {
            i = list.begin();
        }
    }

    return list;
}

int main() {

    for (int i = 0; i < 100; ++i) {
        for (auto v : generate()) {
            assert(terminalSimbols.find(v) != terminalSimbols.end());
            std::cout << finalReplacement.at(v);
        }
        std::cout << std::endl;
    }

    return 0;
}

Output:

/home/emmanuel/Documentos/temp/testregularthing/cmake-build-debug/testregularthing
a := 1;b := 2;
;
a := 1;
bbab := 2;b := 1;
a := 2;
;
bab := 12;
b := 1;
b := 2;
a := 12;
;
;
;b := 1;
b := 1;
a := 1;
baa := 1;
a := 2;
bbaab := 1;
;
b := 2;
aba := 2;
;
a := 2;
a := 1;
bb := 2;
a := 2;
b := 111;
a := 1;
baab := 21;a := 2;bb := 121;bbbabaabab := 1;;b := 2;;
;
;
bb := 1;aa := 2;;
;
bbaa := 2;a := 1;
ba := 1;
baababaaababaabaaabaaba := 2;
aabbaabaabaaaaaabb := 12;
a := 1;
;;
;
ab := 2;
b := 2;
;
;;
a := 1;;;
bbb := 211;;;
;
aa := 1;
a := 1;
ba := 1;
;
abb := 1;
b := 1;
b := 2;b := 12;;b := 2;;a := 2;b := 2;;b := 111112;
b := 12;
;bba := 1;ab := 2;;
a := 1;
b := 1;
ba := 2;
a := 2;
ababb := 12;
aba := 1;
;
abaa := 1;
;a := 1;;
a := 2;
;
a := 1;
;
abaabb := 1;b := 1;a := 1;
b := 2;
;
;
a := 1;
;;
b := 1;
;
;
b := 12;
;
a := 2;
;a := 2;
;
;
;
;
b := 2;
bb := 1;b := 11122;
;
a := 1;
aabbb := 122;
b := 2121;
ba := 1;;
bb := 2;
ba := 2;
;
b := 1;
;
;
a := 11;
SebastianMestre commented 4 years ago

That's pretty cool. I had tried something similar a while back, but I never actually filled out the tables according to Jasper's syntax. Is that something you could do?

We don't have a full BNF for Jasper's syntax, but I could try to write it up, if that'd help.

EmmanuelMess commented 4 years ago

I can try to make production rules that fit the language.

EmmanuelMess commented 4 years ago

How should redefinitions of the same variable be handled? like: a:= 10; a:=1;

SebastianMestre commented 4 years ago

I can try to make production rules that fit the language.

Would having a BNF help?

How should redefinitions of the same variable be handled? like: a:= 10; a:=1;

It shouldn't be allowed.

That is not what actually happens in the compiler, which could be considered a bug. (EDIT: addressed in #188)

EmmanuelMess commented 4 years ago

Would having a BNF help?

Reading about BNF, it seems it constructs a context free language, the issue right now for me would be dealing with (variable) names and how to not repeat them.

SebastianMestre commented 4 years ago

the issue right now for me would be dealing with (variable) names and how to not repeat them.

The way we handle variables in the compiler is with a stack of hash tables.

Whenever a new scope is introduced (either by a for-statement, a block-statement, or a function-expression), we push a new table onto the stack, and pop it when the scope ends.

When producing a new name, one could generate random (valid) strings until one that is not "in the current scope" is generated.