Closed alaniwi closed 6 years ago
Good catch ! I agree. I will add this support in the next release. Thanks !
Thanks. I think that this should apply equally also to any other filters specified with --exclude-file
option: they should only apply to the relative part of the path, not the leading directory.
Just noting here that a partial workaround is available at present:
--ignore-dir '^.*/files.*$'
but this will mean that all hidden files get processed, so it is not ideal.
I already apply the following fix in the next ongoing release:
in the Collector class, the PathFilter
will be only apply to the "recursive" part of the root
variable in the os.walk
loop: https://github.com/ESGF/esgf-prepare/blob/2ab2aef4dbd0b6b8f8479b0f6462f9263db7fb55/esgprep/utils/collectors.py#L63
I will split the root
with root.split(source)
and only do the if
test with PathFilter on the root.split(source)[1]
instead of the whole root
variable. Thus, if a "hidden" folder is part of the submitted source/input the PathFilter (driven by the --ignore-dir
flag) it is not skipped. The PathFilter will have only effect on the downstream part of the path from the source
level.
In scanning inside
esgmapfile
, if any element of the path to a data file begins with a.
, then that file is skipped.It is reasonable that if a "hidden" file or directory (starting with
.
) is encountered during recursion below the starting directory, then this directory should be skipped. However, if a "hidden" directory is part of the leading part of the path that is specified explicitly on the command line, then it can be assumed that the user intends to scan the directory, so files should should not be skipped only for this reason (although they could still be skipped for some other reason, including where another hidden element is found during recursion).