arthurherbout / crypto_code_detection

Automatic Detection of Custom Cryptographic C Code
8 stars 4 forks source link

(2) XGBoost Model (Google Cloud VM) #20

Open Hadrien-Cornier opened 4 years ago

Hadrien-Cornier commented 4 years ago

I setup a first machine on debian 9 with 24 CPUs to do a little parameter tuning on xgboost , and got a pretty significant increase in positive's recall score from 66 to ~85% .

Then I wanted to use some GPUs and had to create another VM with an OS that supported CUDA and parallel usage of GPUs, and transferred my uncommitted branch with a snapshot. I used a Google's preset debian machine with cuda preinstalled (normally debian is not supported by cuda, but somehow it works on google's preset OS (i think it is because Ubuntu and Debian and really close). However the VM did not have NCCL installed which is necessary to use several GPUs in parallel (which is great for xgboost, since they recently added a gpu optimized tree method as well as a gpu optimized binary logistic loss function), so I installed NCCL and now we have a setup with 8 CPUs and 4 GPUs. I also setup jupyter notebook, linked to a static IP adress and open port (5000) and fixed a problem where the disconnection caused the kernel to crash (my solution was to use linux's "screen" function which is like tmux, and allows to launch a jupyter notebook in a stable way, such that the VM continues working when we disconnect).

Hadrien-Cornier commented 4 years ago

For now the goal is to optimize the current model (it's never a waste to finetune the current imperfect model as it will always be available to boost by aggregation another potentially more powerful model), and then use the false negatives (in our problem false negatives are way worse than false positives since they could lead to security issues) to look for new easy features that we could add, and then look into incorporating larger semantics into the model (ie functions, ASTs etc...). The computational power that we have will help us finetune the current model and explore other models faster.