Closed ashishcse0031 closed 5 years ago
Are you attempting to run on the original dataset release containing samples from 2017? If so, then you'll need to specify the feature version number (in this case 1) when you vectorize the features:
ember.create_vectorized_features(data_dir, 1)
If you're working on the most recent dataset release, then there's definitely a bug I'll have to track down.
I was working with 2017 dataset, and it solved the issue.
`data_dir = "/home/cse31/MalReserach/data/ember/"
ember.create_vectorized_features(datadir) = ember.create_metadata(data_dir) ` Vectorizing training set 0%| | 0/900000 [00:00<?, ?it/s]
RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/home/cse31/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/cse31/anaconda3/lib/python3.7/site-packages/ember-0.1.0-py3.7.egg/ember/init.py", line 44, in vectorize_unpack return vectorize(args) File "/home/cse31/anaconda3/lib/python3.7/site-packages/ember-0.1.0-py3.7.egg/ember/init.py", line 31, in vectorize feature_vector = extractor.process_raw_features(raw_features) File "/home/cse31/anaconda3/lib/python3.7/site-packages/ember-0.1.0-py3.7.egg/ember/features.py", line 522, in process_raw_features feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features] File "/home/cse31/anaconda3/lib/python3.7/site-packages/ember-0.1.0-py3.7.egg/ember/features.py", line 522, in
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
KeyError: 'datadirectories'
"""
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)