ALFA-group / malware_challenge

[AdvML@KDD 2019] Robust Malware Detection Challenge
MIT License
17 stars 8 forks source link

How to get raw binary files for sleipnir-dataset #2

Closed vietvo89 closed 3 years ago

vietvo89 commented 3 years ago

Hello

How can I collect binary files for your dataset? I want to train my model based on your dataset but it requires raw binary files rather than extracted files.

Thanks

ash-aldujaili commented 3 years ago

Hi @vietvo89 ,

We created the corpus of malicious and benign PE files from VirusShare [3] and internet download sites. Due to license issues, unfortunately, we can not share these binary files and we have just shared the extracted features as you mentioned. If you are interested in binary files, then we suggest collecting them on your own.

Hope it helps.

Best,

vietvo89 commented 3 years ago

Hi @ash-aldujaili

I see your point and thank you for your reply. Just one more thing. For the benign ones, have you gotten them from VirusShare or any specific download sites? I have sufficient malware samples and need more benign samples. I know VirusTotal (not VirusShare) can provide both of them but with enterprise access that costs a lot of money. I think it is inevitable but at this stage of my research, I am looking for a medium (around 10K files per type) dataset for demonstration and proposal first.

Thanks