elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
939 stars 278 forks source link

How can I test Ember model with real PE file (malware or benign)? #66

Open vietvo89 opened 3 years ago

vietvo89 commented 3 years ago

Hi Phil

How can I test the Ember model with real PE files (malware or benign)?

Thanks

gxenos commented 3 years ago

In the scripts folder there is a classify_binaries.py script you can use to classify PEs, assuming you have a trained EMBER model. In case you don't, there is the init_ember.py script in the same folder; Use that to create the model.

vietvo89 commented 3 years ago

Thank gxenos!

You are right, I found it. score = ember.predict_sample(lgbm_model, file_data, args.featureversion)

file_data = actual malware (raw file) And one module "extractor = PEFeatureExtractor(feature_version)" will extract features from a raw file.

But do you know how to evaluate Malcon model with PE files from EMBER dataset? I think this dataset was extracted but Malcon need raw input instead of the extracted ones.

bfilar commented 3 years ago

@vietvo89 for Malconv you should just need the raw bytes.

Something like below should work:

bytez = open('raw_ember_sample', 'rb').read()
vietvo89 commented 3 years ago

@bfilar, Thank you very much indeed! Actually, I found a way to do that when following malware evasion competition

But it seems different from yours. Actually, the model Malcon and NegMalcon used in the "malware_evasion_competition" repo have low accuracy when evaluated on my small dataset. It may be insufficiently fair and reliable but I did the same thing for the other two pre-train models and it looks very positive. I believe Malcon model used in this repo is exactly the same as the one in "malware_evasion_competition" repo. One possible reason is that some samples on my dataset coming from https://mlsec.io/ were modified to evade Malcon and NegMalcon in the competition 2019 but I believe many of them should bypass Ember too. The other two relatively good models (boost gradient stored in Pesidious repo and Malconv 2) can be found here.

raiza97 commented 2 years ago

Hi, I have this problem when i try to test my model (based on ember v2 features) with real PE Files. This is my code:

for file in file_list:
data = open(file,"rb").read() features = extractor.feature_vector(data) y_predict.append(model.predict(features))

I get this error: entry_section = lief_binary.section_from_offset(lief_binary.entrypoint).name AttributeError: 'NoneType' object has no attribute 'name'

I tried with different .exe, and I get always the same error.

mounirhajri commented 2 years ago

Hi, I have this problem when i try to test my model (based on ember v2 features) with real PE Files. This is my code:

for file in file_list: data = open(file,"rb").read() features = extractor.feature_vector(data) y_predict.append(model.predict(features))

I get this error: entry_section = lief_binary.section_from_offset(lief_binary.entrypoint).name AttributeError: 'NoneType' object has no attribute 'name'

I tried with different .exe, and I get always the same error.

I run in the same error. Does somebody knows where the problem is?

raiza97 commented 2 years ago

Hi, I have this problem when i try to test my model (based on ember v2 features) with real PE Files. This is my code: for file in file_list: data = open(file,"rb").read() features = extractor.feature_vector(data) y_predict.append(model.predict(features)) I get this error: entry_section = lief_binary.section_from_offset(lief_binary.entrypoint).name AttributeError: 'NoneType' object has no attribute 'name' I tried with different .exe, and I get always the same error.

I run in the same error. Does somebody knows where the problem is?

Solved: just create a conda Python 3.6 environment

mounirhajri commented 2 years ago

I created a new environment in Conda but the error still occurs .. you sure that issue is because of that ?

mrphilroth commented 2 years ago

I believe this error is from using the latest versions of LIEF to calculate EMBER version 2 features. Instead, make sure you're using LIEF version 0.9.0.

@mounirhajri: git pull to get the latest EMBER code, make a new conda environment using requirements_conda.txt, and then you shouldn't get the entrypoint error anymore. (It was fixed in this recent PR: https://github.com/elastic/ember/pull/90)

mounirhajri commented 2 years ago

the problem was @mrphilroth as you described. also i had the mac issue with lief version <0.9.0 :) fixed everything and could use the dataset for an implementation