Closed yustoris closed 4 years ago
If you need more complex pattern matching than fnmatch
can offer, you probably need to use regular expression. In any event, I don't see how you can avoid walking the whole tree where files that you need to match might be. There is a walk tool for this. You can apply your matching pattern to every item (with fnmatch
or re
) yielded by walk
.
Thank you for your quick response!
I've already tried the walk tool in pydoop, but it took approx. 10~50 times longer compared to the bare hadoop
command...
However, as you suggest, it seems that there is no way to search more efficiently without fully walking files.
I've already read https://github.com/crs4/pydoop/issues/12, however I could not figure out how to check whether the files that match more complex patterns eg.
/something/a-b-[cdef]-*/part-*
. In my case, I could not determine where wildcards are inserted into the patterns.Do I have to walk all paths from the root of HDFS ? I would not like to do that because there are too many files in my HDFS.