OCFL / spec

The Oxford Common File Layout (OCFL) specifications
https://ocfl.io
52 stars 14 forks source link

What if my content includes files called `inventory.json` or `inventory.json.sha512`? #230

Closed zimeon closed 5 years ago

zimeon commented 5 years ago

I see two options:

1) With the spec as currently written there would need to be a note explaining that these are special cases where the existing file path cannot be the original filepath, it must be renamed inside the OCFL object, e.g.:

[object root]
|- v1
|    |- inventory.json   <-- the actual inventory, not the file so named in v1 content
|    |- inventory.json.sha512   <-- the actual digest sidecar
|    |- inventory.json_moved   <-- the content called inventory.json in v1 state
|    \- inventory.json.sha512_moved  <-- the content called inventory.json.sha512 in v1 state
\...

2) Use another directory as is done in BagIt (where it is called data) so that there is clean separation:

[object root]
|- v1
|    |- inventory.json   <-- the actual inventory, not the file so named in v1 content
|    |- inventory.json.sha512   <-- the actual digest sidecar
|    \- data
|          |- inventory.json  <-- the content called inventory.json in v1 state
|          \- inventory.json.sha512  <-- the content called inventory.json.sha512 in v1 state
\...
ahankinson commented 5 years ago

I think this is a problem, but I'm not sure how big a problem it is.

1) The inventory files are only valid in the root of the version directory. Any other inventory files are treated as content. 2) Inventory files in the version directories are optional (but recommended)

If we have to address it, though, I would go with option 2.

ahankinson commented 5 years ago

Maybe call it content, instead of data, since that's what we seem to be calling it anyway?

zimeon commented 5 years ago

I agree that if we were to adopt option 2 then content would be a better name.

neilsjefferies commented 5 years ago

...or make the name ocfl_inventory.json etc. to make it a minimal chance of collision and just say they are reserved and you can't have them.

ahankinson commented 5 years ago

@neilsjefferies that might get a bit messy if we're wanting to handle arbitrary content from someone's HD. Especially if they've adopted OCFL as a way of organizing their content. :)

neilsjefferies commented 5 years ago

That would be archivists just being silly. If that is what you are doing then packaging is really the way to go.

awoods commented 5 years ago

It is probably important that we support the case of someone's content having the same name as OCFL administrative metadata files. That being the case, I would also prefer option no.2 for its avoidance of renaming files.

zimeon commented 5 years ago

I lean toward having an extra content directory to make it clean/clear

neilsjefferies commented 5 years ago

Since the inventory already has a mechanism for separating logical path in the object from actual path on disk, isn't this really just an Implementation Note. Since you can't have those names on disk, rename them - they can still have the name in the inventory. The system we have can handle it already, why introduce a change for a small corner case?

ahankinson commented 5 years ago

TBH, having the inventory in the same directory root as content has never really sat well with me. I just wasn't able to say why. It seemed like we were mixing administrative and content data.

ahankinson commented 5 years ago

@rosy1280 @julianmorley pretty please could we have some input on this so we can move it along?

rosy1280 commented 5 years ago

i think it would be cleaner and more human readable if their was a content directory and we put the content files in them. there is a reason that bagit and moab do this. so lets not reject use cases because we don't think people should do them.

julianmorley commented 5 years ago

Yeah, having a content directory makes a great deal of sense. It enables a clean nesting of OCFL objects inside OCFL objects.

rosy1280 commented 5 years ago

also when you start to think about distributed digital preservation (which i don't want to broach, but...) if Emory's repository implements OCFL and sends their content to Chronopolis that has implemented OCFL you may end up with clashing inventory.json and inventory.json.sha512

zimeon commented 5 years ago

Per https://github.com/OCFL/spec/wiki/2018.10.17-Editors-Meeting agreed to use content directory

ahankinson commented 5 years ago

if it wasn't clear by the assignments, I'm currently working on a PR to this effect

ahankinson commented 5 years ago

Q: Do the 'logical' filepaths also omit the 'content' part?

awoods commented 5 years ago

@ahankinson : it would make sense to me that content would be excluded from logical file paths.

ahankinson commented 5 years ago

👍

ahankinson commented 5 years ago

Also, are content directories a MUST?