Closed dimbtp closed 6 years ago
The yara.Rules
object is not a pure-Python object, it's implemented via a C extension and it doesn't support pickling. Instead of passing the compiled rules to your workers you could launch the workers first, passing the rules in text form, and compile the rules in the work
function. In other words, instead of compiling the rules in the main process, each worker is responsible of compiling the rules for themselves.
Similar issue with yara.Match
:
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '['REDACTED']'. Reason: 'TypeError("can't pickle yara.Match objects")'
(yara.Match objects saved in a field of object that is getting returned from pool)
Correct, yara.Match
is defined by a Python C extension and therefore it doesn't support pickling.
Correct,
yara.Match
is defined by a Python C extension and therefore it doesn't support pickling.
Would it be possible for you to change it to have a more normal python object structure ?
it also doesn't support __dict__
, json.dumps()
and other useful object representation functions .
I'm going to end up writing a wrapper to convert it to dict anyway if not.
Basically I need a way to get the object fields in usable Python data structure
Implementing the pickle interface for objects defined in C is possible, but a bit tricky, so it's not in the roadmap. You can store the information in a pure Python object as you said.
Implementing the pickle interface for objects defined in C is possible, but a bit tricky, so it's not in the roadmap. You can store the information in a pure Python object as you said.
is there a quick way (like built-in function) to convert the C object to pure python ?
I searched for an answer but couldn't find it - I wrote a snippet to manually build dicts from the fields https://gist.github.com/wesinator/eda62d75e8bd437267477a887406d0c8
thanks,
I tried to use multiprocessing in automatic yara analysis, but the problem happens --------------------------------Trace Log----------------------
rule_sets is a list which contain yara.Rules object(yara.compile('rule_path') result) Another problem is that: when I call pool.apply_async(func=work, args=(path, filePath, rule_sets, num_first_bytes,)) rather than pool.apply way i find that function [worker] doesn't execute at all(I tried to print a string in func [worker] but nothing printed)
Also tested that module [dill] can not handle yara.Rules object neither
The only way I can think up is rewriting them in C/C++ Any advice to solve this problem? THX