arthurherbout / crypto_code_detection

Automatic Detection of Custom Cryptographic C Code
8 stars 4 forks source link

Function-level cryptography detection #41

Open arnaudstiegler opened 4 years ago

arnaudstiegler commented 4 years ago

Got fed up with .c files which are a pain to parse. Looking into other languages (currently Python and Java)

arnaudstiegler commented 4 years ago

Interesting to see how different crypto looks in python:

def encrypt(msg, key):
    msg = toBin(msg)
    key = toBin(key)
    keyList = []
    addList = []
    temp = ""
    for i in range(0, len(key), 32):
        if(i < len(key)/2):
            temp = key[i:i+32]
            keyList.append(temp)
        if(i >= len(key)/2):
            temp = key[i:i+32]
            addList.append(temp)
    blockList = []
    for j in range(0, len(keyList)):
        for i in range(0, len(msg), 32):
            temp = ""
            temp += msg[i:i+32]
            if(len(temp) != 32):
                temp = to32Bit(temp)
            blockList.append(temp)
            blockList.append(addList[j])   
        for i in range(0, len(blockList), 2):
            blockList[i] = xOR(blockList[i], keyList[j])
            temp =  blockList[i]
            blockList[i] =  blockList[i+1]
            blockList[i+1] = temp
    result = ""
    for i in range(0, len(blockList)):
        result += blockList[i]
    result = toText(result)
    return result

I doubt that a simple regex could deal with that

arnaudstiegler commented 4 years ago

Some good points with crypto in python:

Hadrien-Cornier commented 4 years ago

I put a small C/C++ parser in crypto_code_detection/parsing_preprocessing that

arnaudstiegler commented 4 years ago

@Hadrien-Cornier Yeah, I saw that, but did you actually check the results? (I mean the function definition using the line given by your parser). Because I saw a ton of comments on Stackoverflow about how unreliable regex is for parsing C code (and apparently C++ is worse)

Besides, I went throught the files, and it was really hard to come up with some rules to label the functions solely on the name. We would need to condition it on the function length (to avoid encrypt functions that just call other functions and are therefore very short), and maybe some other criteria

redouane-dziri commented 4 years ago

Interesting to see how different crypto looks in python:

def encrypt(msg, key):
    msg = toBin(msg)
    key = toBin(key)
    keyList = []
    addList = []
    temp = ""
    for i in range(0, len(key), 32):
        if(i < len(key)/2):
            temp = key[i:i+32]
            keyList.append(temp)
        if(i >= len(key)/2):
            temp = key[i:i+32]
            addList.append(temp)
    blockList = []
    for j in range(0, len(keyList)):
        for i in range(0, len(msg), 32):
            temp = ""
            temp += msg[i:i+32]
            if(len(temp) != 32):
                temp = to32Bit(temp)
            blockList.append(temp)
            blockList.append(addList[j])   
        for i in range(0, len(blockList), 2):
            blockList[i] = xOR(blockList[i], keyList[j])
            temp =  blockList[i]
            blockList[i] =  blockList[i+1]
            blockList[i+1] = temp
    result = ""
    for i in range(0, len(blockList)):
        result += blockList[i]
    result = toText(result)
    return result

I doubt that a simple regex could deal with that

A simple regex would match on the first line :p

arnaudstiegler commented 4 years ago

Haha, true if we use the function name, but the point of this is to work only with the function definition and use the function name for labeling. So you wouldn't be looking at the function name, and I doubt a simple regex would work for the rest