elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
949 stars 277 forks source link

How do you get the dataset in csv format for the repo? #54

Closed RishabhJain-Github closed 3 years ago

KhenfouciYa commented 3 years ago

@RishabhJain-Github i have the same question could you informe me if you have any idea how we can produce the CSV format of ember dataset

mrphilroth commented 3 years ago

First, I'd like to say that you probably don't need the dataset in CSV format. You can read the vectorized features in with the read_vectorized_features function and work with the dense numpy arrays. Most model packages will accept numpy arrays as input. But assuming you've thought this through and you still need CSV features for whatever reason, here is one example of how you could do it:

import ember
import pandas as pd

X, y = ember.read_vectorized_features(".", "train")
df = pd.DataFrame(X)
df.to_csv("features_in_a_csv_file.csv")