elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
948 stars 277 forks source link

How to get the original bytes of the PE file. I want to covert a file to a gray image. #85

Open zxC0der opened 2 years ago

zxC0der commented 2 years ago

Thanks

kevin3567 commented 2 years ago

Same here. Would it be possible to acquire the original bytes of the malware files (or at least the bytes of the PE headers)?

lkurlandski commented 2 years ago

No. In their paper, the authors discuss why they do not release the raw executables. The SOREL project worked on improving upon some of the shortcommings of EMBER, including this issue. They release raw binaries for the malware files only. You can check them out here: SOREL-20M

isimsizolan commented 2 years ago

Yet sorel did not publish benign sets either.

lkurlandski commented 2 years ago

Yes, because benign software is typically proprietary.

isimsizolan commented 2 years ago

Yet, it is hard limit. Very selected few have access to this proprietary ground truth good benign dataset.

I also think that it is an excuse because you can theoretically collect some of those proprietary software by yourself, plus they can be disarmed the same way used to disarm malwares.

I'm a fulltime academic personel in one of the most respectable university in my country, I have contacted almost all security companies and literally begged good ground truth benign dataset and none responded positive.

Just a few days ago, virustotal refused my api request for getting AVScores of self benign-only dataset for at least making a baseline, they said "it is extremely unethical to compete with anti-virus companies using their product line"