VirusTotal / yara-python

The Python interface for YARA
http://virustotal.github.io/yara/
Apache License 2.0
659 stars 179 forks source link

Non-unicode filenames causes UnicodeEncodeError on python3 #68

Open binrush opened 7 years ago

binrush commented 7 years ago

On linux, file names are actually bytes, not unicode. Yara can not scan file containing non-unicode bytes:

import pathlib
import os
import yara
p = pathlib.Path(os.fsdecode(b'/tmp/\x44\xf9'))
p.write_text('malware')
rules = yara.compile('main.yara')
rules.match(str(p)) # UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf9' in position 6: surrogates not allowed

How should I decode bytes filename to pass it to match() ?