Closed Natruel closed 3 years ago
The error you are referring to is displayed when you have lief version 0.9.0 installed, but are trying to generate ember version 1 features. You need lief version 0.8.3 if you want to generate ember version 1 features. But if you just want to work with the latest feature set (ember version 2 features), then you can stick with lief version 0.9.0.
I believe that ember can be run under python 3.5, but I haven't tried it. You will only run into trouble with f-strings if you hit the lief error I mention above.
Thank for your reply! Do you mean that the default value of feature_version is 2 when i run the "train_ember";because when i read the code ,i find this `
def init(self, feature_version=2):
self.features = [
ByteHistogram(),
ByteEntropyHistogram(),
StringExtractor(),
GeneralFileInfo(),
HeaderFileInfo(),
SectionInfo(),
ImportsInfo(),
ExportsInfo()
]
` but i use the default value to run the code and don't specify the parameter of the feature_version, it still occurs the mistake that i mentioned in the question. So maybe there is any other reasons? Or i just should run it under the python3.6 and i will try it and obtain the result.
And there is another question i want to question.When i run the code ,“ember.create_vectorized_features("D:\study\untitled1\ember_data")”,
on the Windows platform, it occurs a mistake whcih looks like about the multiple processes, after i read the code,
(the mistake looks like this:
0%| | 0/900000 [00:00<?, ?it/s]multiprocessing.pool.RemoteTraceback:
""" Traceback (most recent call last): File "D:\study\anaconda\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "D:\study\anaconda\ember\__init__.py", line 44, in vectorize_unpack return vectorize(*args) File "D:\study\anaconda\ember\__init__.py", line 31, in vectorize feature_vector = extractor.process_raw_features(raw_features) File "D:\study\anaconda\ember\features.py", line 522, in process_raw_features feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features] File "D:\study\anaconda\ember\features.py", line 522, in <listcomp> feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features] KeyError: 'datadirectories' """
)
i Commented out the:
for _ in tqdm.tqdm(pool.imap_unordered(vectorize_unpack, argument_iterator), total=nrows): pass
(the code in the init.py, line 62,63) because i think it is useless and it just relates to display progress bar. After it ,the code can run;but when i run the code: ` import ember
def test(): ember.create_vectorized_features("D:\study\untitled1\ember_data") X_train, y_train, X_test, y_test = ember.read_vectorized_features("D:\study\untitled1\ember_data") print(X_train)
if name == 'main': test() `
and i find that all of value are 0.And i don't know what is wrong.
Why all of the value are 0 is that it seems that i don't read the data into the np.memmap array.But i still don't know what is wrong.
The default value for feature version is 2.
The KeyError: 'datadirectories'
error is solved here:
https://github.com/endgameinc/ember/issues/28#issuecomment-523456373
All the downloads are listed here: https://github.com/endgameinc/ember/#download You must use one that has feature version 2 available in it. The original download from 1.5 years ago only has enough information available for feature version 1.
I can run the code and Vectorize data ;but when i read the data, it occurs a mistake that it prompts insufficient memory. After I read the code and look up some information, it seems that np.memmap is used to solve this problem ,but it dosen't work and i don't know how to make it work even after i read the official document. And i learn about some types of reading a large amount of data,but i don't know how to read the data when its format is the "dat".So i just wan to know how to solve this problem.Because i need these data to train other models and i have to read them all ,or i can choose a part of them to train the model.But both of them need to read the data.
The memmap
function doesn't use your memory. This line will read the data into memory. That's probably where you're running into insufficient memory errors:
https://github.com/endgameinc/ember/blob/master/ember/__init__.py#L211
Can the code run under the python3.5 ? When im done this ,these seems a question that it reported that the ember need lief 0.8.3, while the requirements.txt is written," lief = 0.9.0".