LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
216 stars 85 forks source link

missing path to file #121

Open fkloft opened 5 years ago

fkloft commented 5 years ago

In this line: https://github.com/LibraryOfCongress/bagit-python/blob/c39b650a80837a1d32314599143d0b5b04159248/bagit.py#L1315

I didn't test this, but shouldn't the argument to os.path.filename be prefixed with bag_dir?

kba commented 5 years ago

This works because the assumption is that os.getcwd() is the bagdir. When creating/saving bags, directory is actively os.chdired.

For the bagit-profiles-validator code, for the same/similar purpose, there's this code, which checks paths relative to bag_dir (https://github.com/bagit-profiles/bagit-profiles-validator/blob/master/bagit_profile.py#L287-L304):

# Return true if any of the pattern fnmatches a file path
def fnmatch_any(f, pats):
    for pat in pats:
        if fnmatch(f, pat):
            return True
    return False

# Find tag files
def find_tag_files(bag_dir):
    for root, _, basenames in walk(bag_dir):
        reldir = relpath(root, bag_dir)
        for basename in basenames:
            if fnmatch(reldir, 'data*') or (reldir == '.' and fnmatch_any(basename,
                ['manifest-*.txt', 'bag-info.txt', 'tagmanifest-*.txt', 'bagit.txt', 'fetch.txt'])):
                continue
            fpath = join(root, basename)
            if isfile(fpath):
                yield fpath
acdha commented 5 years ago

Note also that I want to change this as part of the upcoming work @edsu and I are planning so the code can be used with non-POSIX filesystems where the notion of current directory doesn't exist.

fkloft commented 5 years ago

Thanks for the response. I'll have to check my code, because I use bagit as an imported module, usually with cwd != bagdir, so that may lead to unexpected results. IMHO, this behavior should be changed.