Hello, I am using the ember-2018 data set, once I try to create the vectorized feature, I am getting an error: KeyError: 'datadirectories'
Vectorizing training set
0%| | 0/900000 [00:00<?, ?it/s]
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/anaconda/envs/azureml_py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, *kwds))
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/init.py", line 44, in vectorize_unpack
return vectorize(args)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/init.py", line 31, in vectorize
feature_vector = extractor.process_raw_features(raw_features)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/features.py", line 531, in process_raw_features
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/features.py", line 531, in
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
KeyError: 'datadirectories'
"""
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
in
----> 1 ember.create_vectorized_features(data_dir, 2)
2 ember.create_metadata(data_dir)
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/__init__.py in create_vectorized_features(data_dir, feature_version)
73 raw_feature_paths = [os.path.join(data_dir, "train_features_{}.jsonl".format(i)) for i in range(6)]
74 nrows = sum([1 for fp in raw_feature_paths for line in open(fp)])
---> 75 vectorize_subset(X_path, y_path, raw_feature_paths, extractor, nrows)
76
77 print("Vectorizing test set")
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/__init__.py in vectorize_subset(X_path, y_path, raw_feature_paths, extractor, nrows)
58 argument_iterator = ((irow, raw_features_string, X_path, y_path, extractor, nrows)
59 for irow, raw_features_string in enumerate(raw_feature_iterator(raw_feature_paths)))
---> 60 for _ in tqdm.tqdm(pool.imap_unordered(vectorize_unpack, argument_iterator), total=nrows):
61 pass
62
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/tqdm/std.py in __iter__(self)
1128
1129 try:
-> 1130 for obj in iterable:
1131 yield obj
1132 # Update and possibly print the progressbar.
/anaconda/envs/azureml_py36/lib/python3.6/multiprocessing/pool.py in next(self, timeout)
733 if success:
734 return value
--> 735 raise value
736
737 __next__ = next # XXX
KeyError: 'datadirectories'
**###################
Requirements seem to be installed correctly:**
pip install -r requirements.txt
Requirement already satisfied: lief>=0.9.0 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (0.9.0)
Requirement already satisfied: tqdm>=4.31.0 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (4.48.0)
Requirement already satisfied: numpy>=1.16.3 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from -r requirements.txt (line 3)) (1.16.6)
Requirement already satisfied: pandas>=0.24.2 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from -r requirements.txt (line 4)) (1.1.0)
Requirement already satisfied: lightgbm>=2.2.3 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from -r requirements.txt (line 5)) (2.3.0)
Requirement already satisfied: scikit-learn>=0.20.3 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from -r requirements.txt (line 6)) (0.20.3)
Requirement already satisfied: pytz>=2017.2 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from pandas>=0.24.2->-r requirements.txt (line 4)) (2019.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from pandas>=0.24.2->-r requirements.txt (line 4)) (2.8.1)
Requirement already satisfied: scipy in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from lightgbm>=2.2.3->-r requirements.txt (line 5)) (1.4.1)
Requirement already satisfied: six>=1.5 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas>=0.24.2->-r requirements.txt (line 4)) (1.12.0)
**Thanks, appreciate your help**
Hello, I am using the ember-2018 data set, once I try to create the vectorized feature, I am getting an error: KeyError: 'datadirectories'
Vectorizing training set 0%| | 0/900000 [00:00<?, ?it/s]
RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/anaconda/envs/azureml_py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/init.py", line 44, in vectorize_unpack return vectorize(args) File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/init.py", line 31, in vectorize feature_vector = extractor.process_raw_features(raw_features) File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/features.py", line 531, in process_raw_features feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features] File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ember/features.py", line 531, in
feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
KeyError: 'datadirectories'
"""
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)