elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
948 stars 277 forks source link

SectionInfo & process_raw_features #72

Closed ai-honzik closed 2 years ago

ai-honzik commented 3 years ago

Hi, I was going through the code and assumed (given by the comment above) that line 168 in __init__.py should have been sum(1 for s in sections if s['size'] != 0). https://github.com/elastic/ember/blob/4dee42918694d72d319e731940755146a71f5c6c/ember/features.py#L168

Regards.

mrphilroth commented 3 years ago

You are absolutely correct. != 0 is what I intended. But using the == 0 feature should result in the same classification performance on most ML algorithms. And the count of nonzero sections is derivable from the features. So I think I'll just bring the comment in line with the existing code and generated data. Thanks!