OCFL / ocfl-java

A Java OCFL implementation
MIT License
16 stars 12 forks source link

Filename length limitation #4

Closed awoods closed 4 years ago

awoods commented 4 years ago

The testing and surfacing of this limitation came through the script at the bottom run against a Fedora 6 repository. The issue, which may potentially require no additional action beyond documenting in the README what the limitation is for file lengths on Ubuntu, is that files with names with 323 characters fails when persisting to OCFL. I have not narrowed down the exact limitation length.

Testing script

clear;x=0; r="http://localhost:8080/rest/"; while true; do x=$(($x + 1));echo $x; r=`curl -XPOST $r`; echo "r = $r"; done 

Script output

pwinckles commented 4 years ago

Cases like this are handled by configuring the repository to enforce path constraints:

https://github.com/UW-Madison-Library/ocfl-java/blob/master/ocfl-java-core/src/main/java/edu/wisc/library/ocfl/core/path/constraint/DefaultContentPathConstraints.java#L72

For example:

var repo = new OcflRepositoryBuilder()
        .layoutConfig(DefaultLayoutConfig.nTupleHashConfig())
        .contentPathConstraintProcessor(DefaultContentPathConstraints.unix())
        .storage(FileSystemOcflStorage.builder().repositoryRoot(repoDir).build())
        .workDir(workDir)
        .build();
pwinckles commented 4 years ago

I should probably rename that property to simply be contentPathConstraints.

awoods commented 4 years ago

What is the result of configuring the repository with the DefaultContentPathConstraints.unix()?

pwinckles commented 4 years ago

If you want to support long logical paths you need to define a custom PathSanitizer:

https://github.com/UW-Madison-Library/ocfl-java/blob/master/ocfl-java-core/src/main/java/edu/wisc/library/ocfl/core/path/sanitize/PathSanitizer.java

pwinckles commented 4 years ago

In this case it rejects logical paths up front that cannot be safely mapped directly to content paths. PathSanitizers provide a way to transform logical paths into safe content paths, but it's currently BYO.

pwinckles commented 4 years ago

I think this is resolved by #5. Currently, the only limitation that ocfl-java has on the length of ocfl object ids is based on the method used to map ids to object root directory paths. If the object id is used as the encapsulation directory, then object ids cannot be longer than 255 characters. Otherwise, the object id can be of any length.