ESGF / esgf-prepare

Toolbox preparing your data for ESGF publication
http://esgf.github.io/esgf-prepare/
1 stars 7 forks source link

in esgmapfile, do not skip "hidden" directories which are part of the specified path #37

Closed alaniwi closed 6 years ago

alaniwi commented 6 years ago

In scanning inside esgmapfile, if any element of the path to a data file begins with a ., then that file is skipped.

It is reasonable that if a "hidden" file or directory (starting with .) is encountered during recursion below the starting directory, then this directory should be skipped. However, if a "hidden" directory is part of the leading part of the path that is specified explicitly on the command line, then it can be assumed that the user intends to scan the directory, so files should should not be skipped only for this reason (although they could still be skipped for some other reason, including where another hidden element is found during recursion).

glevava commented 6 years ago

Good catch ! I agree. I will add this support in the next release. Thanks !

alaniwi commented 6 years ago

Thanks. I think that this should apply equally also to any other filters specified with --exclude-file option: they should only apply to the relative part of the path, not the leading directory.

alaniwi commented 6 years ago

Just noting here that a partial workaround is available at present:

--ignore-dir '^.*/files.*$'

but this will mean that all hidden files get processed, so it is not ideal.

glevava commented 6 years ago

I already apply the following fix in the next ongoing release: in the Collector class, the PathFilter will be only apply to the "recursive" part of the root variable in the os.walk loop: https://github.com/ESGF/esgf-prepare/blob/2ab2aef4dbd0b6b8f8479b0f6462f9263db7fb55/esgprep/utils/collectors.py#L63 I will split the root with root.split(source) and only do the if test with PathFilter on the root.split(source)[1] instead of the whole root variable. Thus, if a "hidden" folder is part of the submitted source/input the PathFilter (driven by the --ignore-dir flag) it is not skipped. The PathFilter will have only effect on the downstream part of the path from the source level.