allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
318 stars 42 forks source link

don't use ** with Path.glob #160

Closed seanmacavaney closed 2 years ago

seanmacavaney commented 2 years ago

Describe the bug As reported by @searchivarius

** in a glob doesn't resolve symlinks when using Path.glob. In cases where it's used, glob.glob(..., recursive=True) should be used instead. For instance:

Tree:
- a/
- - c/
- - - d
- b -> a

Expected: **/d -> [a/c/d, b/c/d]

Actual:
glob.glob('**/d', recursive=True) -> ['a/c/d', 'b/c/d'] # pass (recursive=True enables **)
list(Path('.').glob('**/d')) -> [PosixPath('a/c/d')] # fail (** enabled by default)

Affected dataset(s)

Expected behavior Symlinks should be allowed (and encouraged!)

Additional context Interestingly, both the PEPs and the documentation don't shine any light on the reasoning for this discrepancy.