Closed veryaustin closed 10 years ago
I apologize for the delay in getting back to you. Let me see if I'm understanding your situation properly. Are you:
Icon
files in the bag payload (data directory)Section 3.4 in the spec states that:
Every payload file MUST be listed in at least one manifest. Payload files MAY be listed in more than one payload manifest.
So validation fails if there is an Icon
file present in the bag that isn't listed in a manifest. Does that help? Perhaps you should configure your software not to create Icon
files, or to create them at another location. Alternatively you could run a command to delete them before validating. Lastly, I suppose you could move bagging to the stage after you have opened files with ProTools so the Icon
files become part of the manifest. But if you subsequently open more files, which modifies or creates additional Icon
files, you will still get an validation error.
But all of these options are out of scope for the bagit-python software. Validation is working as intended.
Thanks for the response. To clarify our workflow is as follows:
Initially I thought the reason the validation failed was because the copy from the original drive to the new drive may have not been an exact copy. I then ran the validation command on the Original drive and it too failed, returning the errors listed in the original post on this thread. As you can see in the validation errors, it says "Icon exists in manifest but not found on filesystem" OR "exists on filesystem but is not in manifestg 1/StemsAndMultitrack/M1 FOR EXPORT/Plug-In Settings/Purple MC77/Icon"
Ok, I think I understand better. Are you able to check out the github project and run the tests to make sure those at least work, as a baseline?
Also, I'm curious what version of bagit-python you are using. I can't find the output of line number 114 in the log output you pasted above.
I'm running these tests on a fresh install of Mac OS 10.8.5 with Python 2.7.6. I was able to checkout the the most recent code on github, and do a build & install. Additionally, I was able to and run test.py and can verify that the tests returned "OK".
If you would like to download and test some sample files that are causing errors, you can download these at https://bmschace.box.com/bagittestfiles.
I ran the bag create and validate command. Below is the output:
test-bench-a:BNA_1017971 administrator$ bagit.py --contact-name 'Test Author' --processes 2 Test\ Files/
2014-03-19 13:36:03,127 - INFO - creating bag for directory /Volumes/BNA_1017971/Test Files
2014-03-19 13:36:03,128 - INFO - creating data dir
2014-03-19 13:36:03,128 - INFO - moving 1 to /Volumes/BNA_1017971/Test Files/tmpYP9s_K/1
2014-03-19 13:36:03,128 - INFO - moving 2 to /Volumes/BNA_1017971/Test Files/tmpYP9s_K/2
2014-03-19 13:36:03,128 - INFO - moving 3 to /Volumes/BNA_1017971/Test Files/tmpYP9s_K/3
2014-03-19 13:36:03,129 - INFO - moving /Volumes/BNA_1017971/Test Files/tmpYP9s_K to data
2014-03-19 13:36:03,129 - INFO - writing manifest-md5.txt
2014-03-19 13:36:03,129 - INFO - writing manifest with 2 processes
2014-03-19 13:36:03,244 - INFO - writing bagit.txt
2014-03-19 13:36:03,245 - INFO - writing bag-info.txt
test-bench-a:BNA_1017971 administrator$ bagit.py --validate Test\ Files/
2014-03-19 13:36:14,015 - WARNING - data/3/Icon exists in manifest but not found on filesystem
2014-03-19 13:36:14,015 - WARNING - data/1/Icon exists in manifest but not found on filesystem
exists on filesystem but is not in manifestcon
exists on filesystem but is not in manifestcon
2014-03-19 13:36:14,016 - INFO - Test Files/ is invalid: invalid bag: data/3/Icon exists in manifest but not found on filesystem ; data/1/Icon exis exists on filesystem but is not in manifest ; data/3/Icon
test-bench-a:BNA_1017971 administrator$
See how the line:
2014-03-19 13:36:03,129 - INFO - moving /Volumes/BNA_1017971/Test Files/tmpYP9s_K to data
I don't know why an equivalent line doesn't show up in your first paste above. Or is it there and I'm just missing it? Thanks for the test files, I'll give them a try!
The first post was ran in Feb and looks to be a different version as now I get the "tagmanifest-md5.txt" file which I didn't in the original post. The only thing similar to the line you are referring to in the original post is the following. Like you I noticed it doesn't say "to data" at the end of the line:
2014-02-06 16:12:53,994 - INFO - moving .DS_Store to /Volumes/BNA_1017971/Cleaned Up Masters/tmpYIWE2W/.DS_Store
2014-02-06 16:12:53,994 - INFO - moving Song 1 to /Volumes/BNA_1017971/Cleaned Up Masters/tmpYIWE2W/Song 1
2014-02-06 16:12:53,995 - INFO - moving Song 2 to /Volumes/BNA_1017971/Cleaned Up Masters/tmpYIWE2W/Song 2
2014-02-06 16:12:53,995 - INFO - moving Song 3 to /Volumes/BNA_1017971/Cleaned Up Masters/tmpYIWE2W/Song 3
In the original post, the same is true for the "Test Files"
2014-02-06 16:11:43,008 - INFO - moving .DS_Store to /Volumes/BNA_1017971/Test Files/tmpOG1qwh/.DS_Store
2014-02-06 16:11:43,009 - INFO - moving 1 to /Volumes/BNA_1017971/Test Files/tmpOG1qwh/1
2014-02-06 16:11:43,009 - INFO - moving 2 to /Volumes/BNA_1017971/Test Files/tmpOG1qwh/2
2014-02-06 16:11:43,009 - INFO - moving 3 to /Volumes/BNA_1017971/Test Files/tmpOG1qwh/3
Either way, I'm on a fresh testing machine and have everything on a baseline with what is currently in the github repo and am still having the same issues. Again, let me know how I can help and I appreciate your assistance with this.
Ok, so here's what I see when I validate the bag in your Test Files.zip
:
.tfx exists in manifest but not found on filesystem
2014-03-19 15:06:50,324 - WARNING - data/2/Icon.tfx exists on filesystem but is not in manifest
.tfx exists in manifest but not found on filesystem ; data/2/Icon.tfx exists on filesystem but is not in manifest
And looking at the manifest I see the problem! The paths seem to have embedded carriage returns in them: ascii 0x0d bytes. I only noticed because I opened the manifest up in my text editor:
So, I will add a unit test to make sure these are getting properly encoded. The spec states that they should be URL encoded.
Now I'm confused again. I edited the manifest to remove the 3 embedded carriage returns and now the bag validates. The filenames you packaged up in that zip do not have embedded carriage returns in them. Can you verify that the files you have do have embedded carriage returns in them? Or perhaps you corrupted your manifest somehow?
The manifest goes with those files that are in the zip. To make sure nothing was corrupted, I removed all of the bagit generated files and re-ran the creation command on these files got the same errors. I opened the newly generated manifest-md5.txt in vim and got the same results as what you posted above. I'm having the exact same results & errors you are having with the same files.
Awesome, so your Icon file names really do have carriage returns in them. Live and learn :smile:
This causes a problem for the manifest since lines in there can be terminated with carriage returns. Interestingly it looks like this is a gap in the BagIt specification.
For now I'll work on a fix to percent encode the carriage returns in the manifest filenames. I'll let you know when there is something for you to try.
A bit of a historical aside: BagIt was largely conceived of as a set of conventions built around what the tool md5deep does. Interestingly md5deep seems to strip the carriage return before putting it into the manifest. This is ok as long as it doesn't result in a collision with another file. For example if you have a directory named data that contains two files:
md5deep -lr data
generates a manifest like this:
401b30e3b8b5d629635a5c613cdb7919 data/foo
acbd18db4cc2f85cedef654fccc4a4d8 data/foo
So the question then would be, which checksum goes with which filename?
I think this is an argument for percent encoding the carriage returns, instead of stripping them. It could be argued that BagIt tools should refuse to bag a directory if the payload has filenames with carriage returns. IMHO this would be somewhat against the spirit of BagIt, which has always been to serve as a low barrier way of packaging up a directory (or folder) that contains files, without having to modify them in any way.
Very interesting insight:) Thanks for looking into this. I look forward to seeing & testing the fix.
Let's leave this open until there's a fix. It should come shortly -- sorry for the delay.
@veryaustin I just uploaded the latest bagit-python (which includes this fix) to PyPI as v1.3.6. I'm sorry this issue took so long to figure out and address. I'm going to be updating the BagIt specification to mention that \r and \n need to be percent encoded in manifest file names. But for the meantime it would be good to try it out here in bagit-python to see if there are any hidden gotchas that the unit tests didn't tease out.
We are using BagIt on drives that contain a variety of file types but mainly contain broadcast wave files and accompanying digital audio workstation files (Pro Tools, Nuendo, Logic, Digital Performer). I have run into an issue where Pro Tools Plugin settings files titled "Icon" are either not written to the manifest and throw and validation error, OR it throws an error indicating a file is in the manifest but not on the drive when running the bagit validate command. All files show up in terminal via the ls -la command and I have verified that all permissions are correct.
Below is the output from the bag creation and validation on a set of audio files and digital audio workstation files.
I copied the "Icon" files out of each of the three songs and put them into "Test Files" directory and ran the bagit create and validate commands. Below is the output:
Unfortuneately, I cannot include any of the specific sessions and audio files listed in the first example but I can provide example "Icon" files for testing. They can be downloaded at the following link:
https://bmschace.box.com/bagittestfiles
Any help with this issue would be greatly appreciated.
Thanks! Austin Lauritsen Director of IT BMS/Chace