duraspace / pcdm

Portland Common Data Model
http://pcdm.org/models
Apache License 2.0
90 stars 11 forks source link

Add a FileSet class #59

Closed escowles closed 7 years ago

escowles commented 7 years ago

The works extension includes a FileSet class that represents an original file and other files derived from it. The Hydra implementation has found this to be a very useful structure, and the key to separating Objects that represent component parts of other Objects from groupings of Files.

Should we add a FileSet class to the core ontology?

See #53 for preliminary discussion.

ruebot commented 7 years ago

Here is a summary of issues and questions with pcdm:FileSet from the Islandora perspective:

  1. Can pcdm:Objects no longer contain files? Meaning, only pcdm:FileSets can contain files.
  2. Is the domain of pcdm:hasFiles pcdm:FileSet?
  3. Is there a forced use of pcdm:hasMember, pcdm:hasFileSet, and pcdm:hasFile?
  4. How do you determine which pcdm:FileSet you are using, if you have multiple pcdm:FileSets?
  5. pcdm:FileSet is an aggregate for files, extends ore:aggregate, No forced use of pcdm:FileSet.
  6. We'd like pcdm:Objects to have just one file if they want, witout the use of pcdm:FileSet.
  7. Don't force IIIF presentation structure for things that will never be IIIF.
whikloj commented 7 years ago

Mea culpa, question 2 should be "Is the domain of pcdm:hasFiles pcdm:FileSet?" which is essentially the same as question 1.

ruebot commented 7 years ago

@whikloj updated.

tpendragon commented 7 years ago

There's some confusion, because FileSet's been refined multiple times due to the previous large ticket. I think the status of those answers now are these:

  1. Yes.
  2. Yes.
  3. Yes.
  4. Same answer as "how do you determine which file you're using".
  5. Yes-ish. pcdm:FileSet extends ore:aggregate, but -not- pcdm:Object, which means no hasMember.
  6. I'll talk about this in a second.
  7. We're not doing that? Being able to crosswalk to IIIF was a happy accident long after we were using hydra-works. They both have about the same level of structural definition.

So my only point is this: Let's say we don't do FileSets as a required construct - there's no node describing file grouping. We, at Hydra, obviously have use cases and have fallen down on it as a necessary construct. So let's say we keep it in the extension, and don't violate anything PCDM. You don't do that. So let's say we each represent a postcard.

Hydra:

<a> <type> <postcard>
<a> <hasMember> <front>
<a> <hasMember> <back>

<front> <hasMember> <frontFiles>
<frontFiles> <type> <FileSet>
<frontFiles> <hasFile> <front.jpg>

<back> <hasMember> <backFiles>
<backFiles> <type> <FileSet>
<backFiles> <hasFile> <back.jpg>

Islandora

<a> <type> <Postcard>
<a> <hasMember> <front>
<a> <hasMember> <back>

<front> <hasFile> <front.jpg>

<back> <hasFile> <back.jpg>

Is there any sort of useful interop we can have here? Are there any tools we can build off of PCDM 1.0 to generically work with both these models and do something useful? If we're just an extension, and we have to stick to PCDM 1, then FileSet has to be a pcdm:object. That means the graphs for Islandora's <front> and our <frontFiles> is the same. There's no way to tell if we're talking about a group of files or "The Front"

If the answer is no, then I think we need FileSet in some form in the ontology. If it's NOT a required construct, then the rules get more complex, and I would love to see examples of how we can have it be non-required (with graphs and restraints on the predicates defined here, in this ticket) and still talk about one another's models. I think we can all be happy here.

escowles commented 7 years ago

What if the Hydra representation was:

<a> <type> <postcard>
<a> <hasMember> <front>
<a> <hasMember> <back>

<front> <hasFile> <front.jpg>
<front> <hasFileSet> <frontFiles>
<frontFiles> <type> <FileSet>
<frontFiles> <hasFileSetMember> <front.jpg>

<back> <hasFile> <back.jpg>
<back> <hasFile> <back.tei>
<back> <hasFileSet> <backFiles>
<backFiles> <type> <FileSet>
<backFiles> <hasFileSetMember> <back.jpg>
<back> <hasFileSet> <backFiles2>
<backFiles2> <type> <FileSet>
<backFiles2> <hasFileSetMember> <back.tei>

I think this lets the Object use hasFile to link to the File, so Islandora and Hydra (and everyone else!) can use the existing pattern. But there is an optional overlay on top of that groups the files, which maps neatly to LDP containers, for purposes of having multiple sets of files, such as both an image and a transcription, or a new digitization, etc., etc.

whikloj commented 7 years ago

@escowles Could you use pcdm:hasFile for the pcdm:FileSet to pcdm:File relationship too?

escowles commented 7 years ago

@whikloj I think you could use pcdm:hasFile to link to Files both from Objects and from FileSets. In both cases, the File is a representation of the Object/FileSet, so I think it's the same.

awead commented 7 years ago

Could hasFileSetMember be a subclass of hasFile? I'm pondering @escowles model. This also would allow for any object to link to any FileSet and create many-to-many relationships?

escowles commented 7 years ago

@awead That's the other option: making hasFileSetMember a subproperty of hasFile instead of just using hasFile. I don't have a strong opinion about which one is better.

Though I'm not sure about linking to FileSets from more than one Object. If Files are part of a single Object, and FileSets serve to group those Files, wouldn't the FileSets also be limited to that Object?

whikloj commented 7 years ago

I don't really have a use case for Files/FileSets being attached to more than one Object, but I remember that @scossu made the comment above

I don't see a problem with a FileSet hanging out by itself or potentially having multiple relationships with other Objects.

Just in case he has a use case he'd like to mention.

tpendragon commented 7 years ago

If Files are part of a single Object, and FileSets serve to group those Files, wouldn't the FileSets also be limited to that Object?

I thought this was the case.

ruebot commented 7 years ago

@whikloj @dannylamb @DiegoPino @bryjbrown One use case we should think about is how we would do without FileSets is the good old ETD (Electronic Thesis/Dissertation) that is a PDF and associated datasets.

azaroth42 commented 7 years ago

I understood the point of a FileSet to be to group together files from the same source bitstream? So PDF plus Datasets is a pcdm:Object, not a FileSet.

ruebot commented 7 years ago

@azaroth42 Cool. That's exactly what I was thinking, but wanted to make sure. I'm just trying to think of other use cases for FileSets from our perspective.

awead commented 7 years ago

@ruebot more broadly, the FileSet could contain derivatives from the original source, whether auto-generated or not, such as thumbnails, but also derived technical information such as fits xml, or other derivative-like things: TEI representations, full-text extraction, etc.

bryjbrown commented 7 years ago

@azaroth42 @ruebot A lot of the research data sets that I've worked with are different "views" (for lack of a better term) of the same raw data. Think different tabs on the same spreadsheet, or a chart image file representing the data in a separate CSV file. Not derivatives in the technical sense, but thematically derivative. Would this be a use case for FileSet, or does it just confuse things?

escowles commented 7 years ago

@bryjbrown That seems like a reasonable use of a FileSet to me — including, for example, a data file and graphs/visualizations of it.

cmharlow commented 7 years ago

A few questions/thoughts based of these last few comments + ideas:

Sorry, coming from a consistency in modeling is key ideal for me here, as I'll have to do quite a bit of batch metadata updates in a few PCDM implementations.

*edited to avoid presumption of inverses to these properties.

escowles commented 7 years ago

Discussion of FileSets has moved on — closing this issue. There is still work going on in the Hydra community about how FileSets should work, and what they represent, and making that compatible with the core ontology.