github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.69k stars 1.54k forks source link

Regarding the issue of taint tracking ineffective in the operation of the C structure #6392

Closed migraine-sudo closed 3 years ago

migraine-sudo commented 3 years ago

Hello, I tried to use CodeQL's global taint tracking. I have tested variables/functions/pointers and have good results. But when I tried to trace the C language structure, I found that taint tracking failed. I am very puzzled. Below is my Demo and QL

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct message{
    int id;
    char command[100];
}message;

//假设这是用户可控的输入函数
int get_user_input(message* m)
{
    char* input = "echo 'you are hacked'";   
    strcpy(m->command,input);
    return 1;
}

int main()
{
    message m={1,"hello world1"};
    get_user_input(&m); // source
    system(m.command); //sink
}

This is the rule I wrote (I simplified some so that you can grasp the key points). I want to track the pollution caused by the get_user_input function to the struct message operation.


import cpp
import semmle.code.cpp.dataflow.TaintTracking

class SystemTrackingConfiguration extends TaintTracking::Configuration {
    SystemTrackingConfiguration() { this = "SystemTrackingConfiguration" }

    override predicate isSource(DataFlow::Node source) {
        forall(FunctionCall fc,Expr e |
          fc.getTarget().hasName("get_user_input") and e = fc.getArgument(0)|
          source.asDefiningArgument()=e // is here right?
          )
    }
    override predicate isSink(DataFlow::Node sink) {
      exists(FunctionCall fc | fc.getTarget().hasName("system") and sink.asExpr() = fc.getArgument(0) )
    }
  }

  from SystemTrackingConfiguration config,DataFlow::Node source,DataFlow::Node sink
  where config.hasFlow(source, sink)
  select source,"Data flow to $@." ,sink

Thanks, if there are some problems, I will also provide help as much as I can

rdmarsh2 commented 3 years ago

This is known limitation of that library. We have a newer version in semmle.code.cpp.ir.dataflow.TaintTracking that has improved handling of both fields and these kinds of side effects. Could you try importing that version instead (the API is the same) and see if that fixes your issue?

migraine-sudo commented 3 years ago

This is known limitation of that library. We have a newer version in semmle.code.cpp.ir.dataflow.TaintTracking that has improved handling of both fields and these kinds of side effects. Could you try importing that version instead (the API is the same) and see if that fixes your issue?

Thank you, according to the method you provided, taint tracking has already supported the structure. Wish you the best at work!

hatface commented 2 years ago
namespace ara {
    namespace diag {
        class ProcessMessage {
        public:
            struct msg {
                int msg_size;
                char *msg;
            };

            virtual void handle_msg(char *msg) = 0;

            virtual void handle_struct(struct msg *the_msg) = 0;
        };

        int do_handle_msg(char *msg) {
            char buff[128] = "helloworld";
            int fd = open(msg, O_RDWR | O_CREAT, 777);
            if (fd == -1) {
                std::cout << "file error" << std::endl;
            }
            int ret = write(fd, buff, strlen(buff));
            if (ret < 0) {
                std::cout << "write fail" << std::endl;
            }
            for (int i = 0; i < atoi(msg); i++) {
                printf("helloworld");
            }
            for (int j = atoi(msg); j < 10; j++) {
                printf("helloworld");
            }
            int k = 0;
            for (k = atoi(msg); k < 10; k++) {
                printf("helloworld\n");
            }
            while (k++ < atoi(msg)) { ;
            }

            int l = atoi(msg);
            while (l++ < 10) { ; }
            do {

            } while (l++ < 10);

            do { ;
            } while (k++ < atoi(msg));

            std::vector<std::string> vect = {"1", "2", "34"};
            for (std::string str : vect) {
                printf("%s\n", str.c_str());
            }
            char myarr[1024];
            printf("%c", myarr[atoi(msg)]);
            close(fd);
            return 0;
        }

        int do_handle_struct_msg(struct ProcessMessage::msg *the_msg) {
            //for in localVar
            for (int i = 0; i < the_msg->msg_size; i++);
            int i, j, k;
            //while
            while (i++ < the_msg->msg_size);
            //do while
            do { ;
            } while (j++ < the_msg->msg_size);
            //for in assignExpr
            for (i = the_msg->msg_size; i < 10; i++);
            //for in initial
            for (int i = the_msg->msg_size; i < 10; i++);
            k = the_msg->msg_size;
            //while in loopcount
            while (k++ < 10);
            //do while in loopcount
            do { ;
            } while (k++ < 100);
            return 1;
        }

        class ProcessMessageImp : public ProcessMessage {
        public:
            void handle_msg(char *msg) {
                do_handle_msg(msg);
            }

            void handle_struct(struct ProcessMessage::msg *the_msg) override {
                do_handle_struct_msg(the_msg);
            }
        };
    }
}

I have the same question, how can I taint track from handle_struct‘s first parameter to the loop condition