Open zxC0der opened 2 years ago
Same here. Would it be possible to acquire the original bytes of the malware files (or at least the bytes of the PE headers)?
No. In their paper, the authors discuss why they do not release the raw executables. The SOREL project worked on improving upon some of the shortcommings of EMBER, including this issue. They release raw binaries for the malware files only. You can check them out here: SOREL-20M
Yet sorel did not publish benign sets either.
Yes, because benign software is typically proprietary.
Yet, it is hard limit. Very selected few have access to this proprietary ground truth good benign dataset.
I also think that it is an excuse because you can theoretically collect some of those proprietary software by yourself, plus they can be disarmed the same way used to disarm malwares.
I'm a fulltime academic personel in one of the most respectable university in my country, I have contacted almost all security companies and literally begged good ground truth benign dataset and none responded positive.
Just a few days ago, virustotal refused my api request for getting AVScores of self benign-only dataset for at least making a baseline, they said "it is extremely unethical to compete with anti-virus companies using their product line"
Thanks