Closed tatsuhirochiba closed 6 years ago
We can set exclude_dirs for skipping file in specified directories. However, current fnmatch in file_utils.py does not provide expected regex format.
exclude_dirs
fnmatch
file_utils.py
Here are example case.
python crawler.py --crawlmode OUTCONTAINER --features file --options '{"file": {"exclude_dirs": ["/boot", "/sys", "/tmp", "/var/cache", "/storage/.*"]}}'
Then generated regex is;
\/boot\Z(?ms)|\/sys\Z(?ms)|\/tmp\Z(?ms)|\/var\/cache\Z(?ms)|\/storage\/\..*\Z(?ms)
This regex rule can not skip files in /storage/* dir recursively.
/storage/*
I want to simplify regex generating code from
exclude_regex = r'|'.join([fnmatch.translate(d) for d in exclude_dirs]) or r'$.'
to
exclude_regex = re.compile('|'.join([d for d in exclude_dirs]))
By this change, we can skip any files in /storage dir.
/storage
Description
We can set
exclude_dirs
for skipping file in specified directories. However, currentfnmatch
infile_utils.py
does not provide expected regex format.Here are example case.
Then generated regex is;
This regex rule can not skip files in
/storage/*
dir recursively.How to improve
I want to simplify regex generating code from
to
By this change, we can skip any files in
/storage
dir.