Chinmaykd21 / CSCI8000_Project_Compiler

Project: Write a simple compiler in C/C++. Objective: Compiler is one of the core software in our system software stack. A vulnerable compiler can instrument malicious instruction in a binary. A compiler has two phases: front-end and back-end. In this project, we will write a simple compiler that will 1) read a code using stdin, 2) tokenize the source (lexical analysis), 3) check correctness of code (syntax analysis), and 4) generate a parse tree.
1 stars 2 forks source link

Invalid memory access #14

Closed mustakimur closed 3 years ago

mustakimur commented 3 years ago

The following vulnerability is detected by AFL Fuzzer.

The following input cause a crash to the program:

int print(int in, st msg){
    if(in < 0){
        strcpy(msg, � The sum is below zero!");
    }
    else if(in < 10)
    {
        strcpy(msg, "The sum is a si digit nic.");
    }
    else{
        strcpy(msg, "The sum is large.");
    }
    return 0;
}
int main()
{
    int a, b, c;
    string msg;
    scanf("%d%d%f", &a, &b, &c);
    return print(a, msg);

}

The debug shows vulnerable code location:

(gdb) r ../out/crashes/id:000000,sig:11,src:000000,op:flip1,pos:63
Starting program: /home/mustakim/projects/compiler/src/compiler ../out/crashes/id:000000,sig:11,src:000000,op:flip1,pos:63
function print
- return int
- param int in
- param st msg
-- if in lt 0
--- call fn strcpy
---- arg st msg
---- arg  string ");EndOfLine

Program received signal SIGSEGV, Segmentation fault.
__memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:217
217 ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: No such file or directory.
(gdb) bt
#0  __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:217
#1  0x00007ffff7f191ee in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x000000000043d750 in std::operator==<char, std::char_traits<char>, std::allocator<char> > (Python Exception <class 'gdb.error'> There is no member named _M_dataplus.: 
__lhs=, __rhs=<optimized out>)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:6177
#3  std::operator!=<char, std::char_traits<char>, std::allocator<char> > (Python Exception <class 'gdb.error'> There is no member named _M_dataplus.: 
__lhs=, __rhs=<optimized out>)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:6215
#4  parseTree (start=<optimized out>, end=33, tokens=..., parenthesisMap=std::unordered_map with 14 elements = {...}, varTypeMap=..., indent=...) at parseTree.cpp:257
#5  0x000000000043c630 in parseTree (start=<optimized out>, end=85, tokens=..., parenthesisMap=std::unordered_map with 14 elements = {...}, varTypeMap=..., indent=...) at parseTree.cpp:151
#6  0x0000000000439259 in parseTree (start=<optimized out>, end=131, tokens=..., parenthesisMap=std::unordered_map with 14 elements = {...}, varTypeMap=..., indent=...) at parseTree.cpp:83
#7  0x0000000000404096 in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:38

The source code location shows an invalid memory(tokens[j]) access with invalid indexing (j):

for (int j = openBracNum + 1; j < closeBracNum; j++)
{
  if (tokens[j] != ",")
  {

Further debug of following code results:

for (int j = openBracNum + 1; j < closeBracNum; j++)
{
  cout << tokens.size() << " compare to " << j << endl;
  if (tokens[j] != ",")
  {

output:

function print
- return int
- param int in
- param st msg
-- if in lt 0
--- call fn strcpy
131 compare to 21
---- arg st msg
131 compare to 22
131 compare to 23
131 compare to 24
131 compare to 25
131 compare to 26
131 compare to 27
131 compare to 28
131 compare to 29
---- arg  string ");EndOfLine
131 compare to -31
Segmentation fault

A negative index value is received. Definitely, the origin of problem is further back.

Invalid memory access is a classic example of causing control flow hijack. The invalid memory can hold malicious code and cause further damange.

ellezeeman commented 3 years ago

Professor, I looked over all the code in parseTree.cpp (as well as the other files) and could not find a clear reason for why j could be causing the invalid memory access. The only possible thing that I thought of could be the following:

when the variable i is initialized, it is set as i=start. When j is initialized, it is done by using i ( int j = i + 3) and the same thing for the variable openBracNum (int openBracNum = i + 1). Both the variables are initialized by using the value of i, so the issue may possibly be with the initialization of i.

What are your thoughts on this?

mustakimur commented 3 years ago

@ellezeeman why not you print out the value start for the exploit input and see if that is a negative value? and if you can reopen the issue, please do so until the problem is fixed.

mustakimur commented 3 years ago

Here is a code (from parseTree.cpp) that I have modified with debug information.


               // cout << "I" << endl;
               cout << "[DEBUG] Value i initilizes openBracNum: " << i << endl;
                if (tokens[i + 1] == "(")
                {
                    string fnName = tokens[i];
                    //cout << "II here" << endl;
                    int openBracNum = i + 1;
                    //cout << "tokens[openBracNum]: " << tokens[openBracNum] << endl;
                    /*cout << "\nParenthesis map : \n";
                                unordered_map<int, int>::iterator itr;
                                for (itr = parenthesisMap.begin(); itr != parenthesisMap.end(); itr++)
                                {

                                    cout << itr->first << "  " << itr->second << endl;
                                }
                                cout << "--------------" << endl;*/
                    auto it = parenthesisMap.find(openBracNum);
                    //cout << "III here: " << it -> second << endl;
                    int closeBracNum = it->second;
                    //cout << "IV here" << endl;

                    cout << indent << " call fn " << tokens[i] << endl;
                    indent.append("-");
                    cout << "[DEBUG] openBracNum initializes value j: " << openBracNum << endl;
                    for (int j = openBracNum + 1; j < closeBracNum; j++)
                    {
                        cout << "[DEBUG] My first debug (token size vs j): " << tokens.size() << " compare to " << j << endl;
                        if (tokens[j] != ",")
                        {

                            if (tokens[j] == "\"")
                            {
                                int closeQuotes;
                                cout << "[DEBUG] closeQuotes uninitialized value: " << closeQuotes << endl;
                                string stringVal = "\"";
                                for (int k = j + 1; k < end; k++)
                                {
                                    stringVal.append(tokens[k]);
                                    if (tokens[k] == "\"")
                                    {
                                        cout << "[DEBUG] closeQuotes assing with value k: " << k << endl;
                                        closeQuotes = k;
                                        break;
                                    }
                                }
                                cout << indent << " arg "
                                     << " string " << stringVal << endl;
                                cout << "[DEBUG] closeQuotes modifies j with +1: " << closeQuotes << endl;
                                j = closeQuotes + 1;
                                i = j;
                            }
                            else
                            {
                                //cout << "V, j: " << j << " tokens[j]: " << tokens[j] << endl;
                                /*cout << "\nAll Elements : \n";
                                unordered_map<string, string>::iterator itr;
                                for (itr = newVarTypeMap.begin(); itr != newVarTypeMap.end(); itr++)
                                {

                                    cout << itr->first << "  " << itr->second << endl;
                                }
                                cout << "--------------" << endl;*/
                                string key;
                                //cout << "fnNAme: " << fnName << endl;
                                if (fnName == "scanf")
                                {
                                   // cout << "I-" << endl;
                                    if(tokens[j].substr(0,1)=="&"){
                                        key = tokens[j].substr(1);
                                       // cout << "II-" << endl;
                                    }
                                }
                                else
                                {
                                    //cout << "II" << endl;
                                    key = tokens[j];
                                }
                               // cout << "key: " << key << endl;
                                auto it1 = newVarTypeMap.find(key);
                                 //cout << "VI" << endl;
                                //cout << it1->second << endl;
                                if(it1 != newVarTypeMap.end()){
                                    string type = it1->second;

                                    //cout << "VII" << endl;
                                    cout << indent << " arg " << type << " " << key << endl;
                                }
                            }
                        }
                    }
                    indent.pop_back();
                }
            }

The output for exploitable input is:

function print
- return int
- param int in
- param st msg
-- if in lt 0
[DEBUG] Value i initilizes openBracNum: 19
--- call fn strcpy
[DEBUG] openBracNum initializes value j: 20
[DEBUG] My first debug (token size vs j): 131 compare to 21
---- arg st msg
[DEBUG] My first debug (token size vs j): 131 compare to 22
[DEBUG] My first debug (token size vs j): 131 compare to 23
[DEBUG] My first debug (token size vs j): 131 compare to 24
[DEBUG] My first debug (token size vs j): 131 compare to 25
[DEBUG] My first debug (token size vs j): 131 compare to 26
[DEBUG] My first debug (token size vs j): 131 compare to 27
[DEBUG] My first debug (token size vs j): 131 compare to 28
[DEBUG] My first debug (token size vs j): 131 compare to 29
[DEBUG] closeQuotes uninitialized value: -33
---- arg  string ");EndOfLine
[DEBUG] closeQuotes modifies j with +1: -33
[DEBUG] My first debug (token size vs j): 131 compare to -31
Segmentation fault

Let's see if you can now find out the original problem.