arthurherbout / crypto_code_detection

Automatic Detection of Custom Cryptographic C Code
8 stars 4 forks source link

(1) Extract cryptography files from the crypto libraries #3

Closed corentinllorca closed 4 years ago

corentinllorca commented 4 years ago

See #1

WARNING: there might be some false positives in there. Not every file in a crypto library implements crypto. Either look "by hand" or make it go through Wind-River.

arnaudstiegler commented 4 years ago

Here are the crypto folder for the main crypto libraries

Libgcrypt: there’s a folder containing all ciphers https://github.com/gpg/libgcrypt/tree/master/cipher

OpenSSL: crypto folder https://github.com/openssl/openssl/tree/master/crypto

LibSodium: the lib sodium folder contains a ton of crypto sub-folder https://github.com/jedisct1/libsodium/tree/master/src/libsodium

NaCl: main folder contains crypto subfolders https://github.com/krig/nacl

Nettle: crypto files scattered within the main folder https://git.lysator.liu.se/nettle/nettle

wolf crypto: files scattered within this folder https://github.com/wolfSSL/wolfssl/tree/master/wolfcrypt/src

ARMmBed: https://github.com/ARMmbed/mbed-crypto/tree/37b5c831b41cd41456caa979f1444234c51e4c51/library

There is a limited amount of crypto algorithm that are being used currently, so pretty much all the libraries will have the same algos in them. However, the implementations differ (at least in terms of code structure) so it would make an interesting dataset.

However, in terms of amount of files, the number is likely to be rather small (few hundreds at most)

arnaudstiegler commented 4 years ago

@corentinllorca, As you said, there are quite a lot of undesired files in there (helpers, wrappers etc...). It is difficult to actually estimate the number of files that should be removed (depends a lot on the library). However, I see 2 issues with using WindRiver for that:

arnaudstiegler commented 4 years ago

For certain packages, quite a lot of functions implemented using Assembly (so .S format) that had to be dropped.

A few random examples of files that are harder to classify:

Also, at least for crypto libraries, you get quite a lot of files with a well-written docstring which explains exactly what the file is for. Should we consider it as being a data leak?

arnaudstiegler commented 4 years ago

810 code files extracted

redouane-dziri commented 4 years ago

Merged and done