elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
954 stars 279 forks source link

Ordinal Inputs #9

Closed comath closed 4 years ago

comath commented 6 years ago

Currently the following line parses only named imports from a library: imports[lib.name].extend([entry.name[:10000] for entry in lib.entries])

Ordinal imports are unnamed and return an empty string when name is called. There are a few elements of the dataset who have high quantities of ordinal imports from some library. ['f1689c71733e2864ed040854eb2b5ca25cec1efa11f8d7994a6bfe6d1235a343', '1ccca67c95f9261491aa229a933df5712e577a4e80689d647d027bf0185c23ee', '4522284d27e3b7e8a8678ab6c1aba9654279646f81683b757c688c61f161f0cb']

This puts the imports feature out of wack. The empty string is sent to 529 and this causes that entry in to be highly variant wrt the other 1023 entries in the feature vector.

This can either be fixed here or upstream in lief.

mrphilroth commented 4 years ago

Fixed in version 2 features. 😄