Closed nkrabben closed 6 years ago
Hi @nkrabben I am curious, what kind of files are you trying to bag that contain \r?
Could you also provide a sample file or a set of directions to reproduce the issue?
The files are Mac Icon files from a born-digital collection that we're processing.
Apparently, it's fairly common for their file names to end with \r although from the sample we're looking at, it's not consistent. Since it's a problem with the file name and not the content, I've been creating test files through bash.
touch "Icon^M"
Looks like this is due to the underlying bagit-java library which should be percent encoding the filename and decoding it when verifying. See https://github.com/LibraryOfCongress/bagit-java/issues/51
@nkrabben take a look at https://github.com/LibraryOfCongress/bagger/releases/tag/v2.7.1 it should fix your problem.
Bagger 2.7.1 is crashing before it finishes loading. I get a few different errors. Loading from the command line I get Error: Could not find or load main class 2. Loading from Finder I get An illegal state occured.
I can still load 2.7.0
Sorry, I was impatient and built it without all the dependencies. I deleted the old version on the releases page and uploaded the new file. I also included the MD5 and SHA1 checksum so you can test and make sure you have the right version.
New error Could not find or load main class 3
Really? I just tried running it and it was fine. Did you delete the old one? Could you add the log files so I can see exactly what happened?
Looks like it was my error. Deleted everything and Bagger loaded fine.
While testing the filename handling, I noticed a new edge case. Bagging a folder that contains "Icon\r" works as expected, but adding "Icon\r" directly to the payload gives the following error: Error adding bag file: ~/Desktop/Icon due to: Can't read ~/Desktop/Icon
It looks like it is a bug in the JDK rt.jar supplied code. Specifically BasicFileChooserUI's inner class ApproveSelectionAction. The code is repeated here for easy viewing:
if (filename != null) {
// Remove whitespaces from end of filename
int i = filename.length() - 1;
while (i >=0 && filename.charAt(i) <= ' ') {
i--;
}
filename = filename.substring(0, i + 1);
}
This code as the comment suggests removes any whitespace at the end of a file name. In this case that is not the correct behavior. I will try and file a bug with the Java developers and post a link here.
Oracle has created a bug based on my report http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8160540
If you create a bag with a file that has a newline character in its filename, the manifest will not represent that characters. For example, "data/Icon\r" is represented as "data/Icon" in the manifest. The bag cannot be validated because the file system does not contain a file at "data/Icon".
A further complication is that this bug can cause filename collisions in the manifest. "data/Icon", "data/Icon\n", and "data/Icon\r" are all represented in the manifest as "data/Icon"
Bagit-python had the same issue. It was fixed by using percent encoding for \r and \n. https://github.com/LibraryOfCongress/bagit-python/issues/12