LibraryOfCongress / bagit-java

Java library to support the BagIt specification.
Other
70 stars 50 forks source link

BagLinter always adds 'different_case' warning #119

Open rvanheest opened 6 years ago

rvanheest commented 6 years ago

I just upgraded to v1.2.0 of this library. With the upgraded stuff on BagIt v1.0 in #118, a new bug has entered the library:

When using the BagLinter, the warning different_case will always be shown. Given it's description:

The bag contains two files that differ only in case. This can cause problems on a filesystem like the one used by apple (HFS).

I would not expect this to happen on a bag with only very distinct files.

I think the bug is as follows: in the code below a manifest file is read and for each line the path is added to the Set paths (after being converted to lowercase).

https://github.com/LibraryOfCongress/bagit-java/blob/97a677004200cf120df4524695bf0ed139a081c0/src/main/java/gov/loc/repository/bagit/conformance/ManifestChecker.java#L119-L139

Immediately after that checkForDifferentCase is called, which checks whether paths.contains(path.toLowerCase()). Of course, this is always true, since path was just added to paths before this check.

https://github.com/LibraryOfCongress/bagit-java/blob/97a677004200cf120df4524695bf0ed139a081c0/src/main/java/gov/loc/repository/bagit/conformance/ManifestChecker.java#L174-L183

acdha commented 6 years ago

I created #120 simply to do the check first. Today has been rather busy so a second set of eyes on the code would be appreciated.

rvanheest commented 6 years ago

@acdha From reasoning through the code this looks fine. But I wasn't able to test it with the code change in my own project, since I can't find a way to build this project to my local Maven repository directory. I keep hitting errors with the current gradle configuration.

What surprises me a bit is that no tests we added or changed. I was looking and running ManifestCheckerTest and saw that (even after the change was applied), DIFFERENT_CASE still showed up in the test results. I'm not sure if this is correct or not (you guys know much more about the whole context than me, of course), but it made me wonder whether this change is all there is to it.

jscancella commented 6 years ago

@acdha I originally created this to catch potential errors like HASH /data/someFile and HASH /data/SomeFile which on mac is the same file, but on other systems should be two different files. With that in mind, it should be giving a warning that they differ only by case.

After looking at this more, it looks like you are correct and this is indeed from adding it to the paths too early.