LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
218 stars 85 forks source link

Bagit fails with 0kb file #139

Open friolator opened 5 years ago

friolator commented 5 years ago

We're running a bag on a folder full of media, some of which was shot on a Sony camera. Sony inserts a file into a folder inside the media folder called SONYCARD.IND -- however this file has nothing in it and is reported as 0kb in size.

bagit.py fails with the following error

2019-10-02 14:11:54,336 - ERROR - An error occurred creating a bag in /mnt/SAN_LTO_6/500_A
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "./bagit.py", line 1398, in generate_manifest_lines
    with open(filename, "rb") as f:
OSError: [Errno 5] Input/output error: 'data/01 MASTERS/101/20140921_Day 2/SONY/Foxtrot/Timelapse/private/SONY/SONYCARD.IND'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./bagit.py", line 242, in make_bag
    "data", processes, algorithms=checksums, encoding=encoding
  File "./bagit.py", line 1250, in make_manifests
    checksums = pool.map(manifest_line_generator, _walk(data_dir))
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
OSError: [Errno 5] Input/output error: 'data/01 MASTERS/101/20140921_Day 2/SONY/Foxtrot/Timelapse/private/SONY/SONYCARD.IND'
2019-10-02 14:11:54,345 - ERROR - Failed to create bag in /mnt/SAN_LTO_6/500_A/: [Errno 5] Input/output error: 'data/01 MASTERS/101/20140921_Day 2/SONY/Foxtrot/Timelapse/private/SONY/SONYCARD.IND'
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "./bagit.py", line 1398, in generate_manifest_lines
    with open(filename, "rb") as f:
OSError: [Errno 5] Input/output error: 'data/01 MASTERS/101/20140921_Day 2/SONY/Foxtrot/Timelapse/private/SONY/SONYCARD.IND'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./bagit.py", line 1603, in main
    checksums=args.checksums,
  File "./bagit.py", line 242, in make_bag
    "data", processes, algorithms=checksums, encoding=encoding
  File "./bagit.py", line 1250, in make_manifests
    checksums = pool.map(manifest_line_generator, _walk(data_dir))
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
OSError: [Errno 5] Input/output error: 'data/01 MASTERS/101/20140921_Day 2/SONY/Foxtrot/Timelapse/private/SONY/SONYCARD.IND'
acdha commented 5 years ago

Do other tools work with those files? That looks like an issue with the flash card interface.

friolator commented 5 years ago

Honestly, not sure. This is the first time we encountered these files, but there were several of them on their media drives, which we're backing up and writing to LTO for them. We aren't working with the files, we're just making backups.

That said, basic applications in linux: ls, cp, less, can all open the file. The only thing I can see about it that might be problematic is that they're totally empty. Permissions are set to rwx across the board. These are not coming off a flash card, we're working off of a hard drive provided by the client, that contains the raw flash card files from the camera.

acdha commented 5 years ago

Can you try copying them to a local filesystem and then seeing if they get errors? bagit.py normally handles zero-byte files without an issue so I was wondering whether this is some issue with whatever filesystem is providing the /mnt/SAN_LTO_6 in the traceback above.

friolator commented 5 years ago

I just copied them to the desktop on my mac:

$ ./bagit.py ~/Desktop/Bag-test/
2019-10-02 16:29:10,082 - INFO - Creating bag for directory /Users/p/Desktop/Bag-test
2019-10-02 16:29:10,083 - INFO - Creating data directory
2019-10-02 16:29:10,083 - INFO - Moving SONYCARD.IND to /Users/p/Desktop/Bag-test/tmp_xe_Qb/SONYCARD.IND
2019-10-02 16:29:10,083 - ERROR - An error occurred creating a bag in /Users/p/Desktop/Bag-test
Traceback (most recent call last):
  File "./bagit.py", line 229, in make_bag
    os.rename(f, new_f)
OSError: [Errno 1] Operation not permitted
2019-10-02 16:29:10,084 - ERROR - Failed to create bag in /Users/p/Desktop/Bag-test/: [Errno 1] Operation not permitted
Traceback (most recent call last):
  File "./bagit.py", line 1603, in main
    checksums=args.checksums,
  File "./bagit.py", line 229, in make_bag
    os.rename(f, new_f)
OSError: [Errno 1] Operation not permitted
acdha commented 5 years ago

Is that crossing a filesystem boundary (network home directories or mounts under that directory?) and/or do you have something like anti-virus or HSM software installed which is breaking POSIX semantics? On a standard Mac, it looks like this:

~ $ cd ~/Desktop/
~/Desktop $ mkdir test-bag
~/Desktop $ touch test-bag/foo
~/Desktop $ touch test-bag/bar
~/Desktop $ date > test-bag/baaz
~/Desktop $ bagit.py test-bag/
2019-10-02 16:58:46,496 - INFO - Creating bag for directory /Users/cadams/Desktop/test-bag
2019-10-02 16:58:46,496 - INFO - Creating data directory
2019-10-02 16:58:46,497 - INFO - Moving baaz to /Users/cadams/Desktop/test-bag/tmpqt82zamf/baaz
2019-10-02 16:58:46,497 - INFO - Moving foo to /Users/cadams/Desktop/test-bag/tmpqt82zamf/foo
2019-10-02 16:58:46,497 - INFO - Moving bar to /Users/cadams/Desktop/test-bag/tmpqt82zamf/bar
2019-10-02 16:58:46,497 - INFO - Moving /Users/cadams/Desktop/test-bag/tmpqt82zamf to data
2019-10-02 16:58:46,497 - INFO - Using 1 processes to generate manifests: sha256, sha512
2019-10-02 16:58:46,497 - INFO - Generating manifest lines for file data/baaz
2019-10-02 16:58:46,498 - INFO - Generating manifest lines for file data/bar
2019-10-02 16:58:46,498 - INFO - Generating manifest lines for file data/foo
2019-10-02 16:58:46,498 - INFO - Creating bagit.txt
2019-10-02 16:58:46,498 - INFO - Creating bag-info.txt
2019-10-02 16:58:46,498 - INFO - Creating /Users/cadams/Desktop/test-bag/tagmanifest-sha256.txt
2019-10-02 16:58:46,499 - INFO - Creating /Users/cadams/Desktop/test-bag/tagmanifest-sha512.txt
~/Desktop $ ls -lR test-bag/
total 48
-rw-r--r--  1 cadams  staff  132 Oct  2 16:58 bag-info.txt
-rw-r--r--  1 cadams  staff   55 Oct  2 16:58 bagit.txt
drwxr-xr-x  5 cadams  staff  160 Oct  2 16:58 data
-rw-r--r--  1 cadams  staff  226 Oct  2 16:58 manifest-sha256.txt
-rw-r--r--  1 cadams  staff  418 Oct  2 16:58 manifest-sha512.txt
-rw-r--r--  1 cadams  staff  323 Oct  2 16:58 tagmanifest-sha256.txt
-rw-r--r--  1 cadams  staff  579 Oct  2 16:58 tagmanifest-sha512.txt

test-bag//data:
total 8
-rw-r--r--  1 cadams  staff  29 Oct  2 16:58 baaz
-rw-r--r--  1 cadams  staff   0 Oct  2 16:58 bar
-rw-r--r--  1 cadams  staff   0 Oct  2 16:58 foo
friolator commented 5 years ago

No anti-virus on any of our machines. The SAN is TigerStore - it's a metadata server that shares up volumes attached to that server. They appear to each workstation as if it was a native disk format. That said, we have made dozens of bags of file sets on the SAN with no issues. I don't think that's the problem. The file on my desktop was copied directly off the client drive (a thunderbolt RAID) to a test folder, with the same results.

I looked a bit more this morning at the other SONYCARD.IND files and they worked. I think there must be something corrupt with that file. It would be nice if bagit could offer to let you skip a file in cases like this since it took over 2 hours to reach the bad file.

Thanks