duraspace / pcdm

Portland Common Data Model
http://pcdm.org/models
Apache License 2.0
90 stars 9 forks source link

Changing file-format-types to be RDFS instead of OWL #27

Closed escowles closed 9 years ago

escowles commented 9 years ago

Fixes #26

escowles commented 9 years ago

I wasn't sure what to do with the udfrs:GenreFacetType instances -- these look like OWL named individuals, and I don't think those map directly to RDFS. Is it OK to leave them as is?

ruebot commented 9 years ago

@escowles thanks for updating the stylesheet too!

Looks like udfrs:GenreFacetType is an owl class: http://udfr.org/onto/onto.rdf

<owl:Class rdf:ID="GenreFacetType">
  <rdfs:subClassOf rdf:resource="#ControlledVocabulary"/>
  <rdfs:isDefinedBy rdf:resource="#"/>
  <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Genre facet type</rdfs:label>
  <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
    The genre facet type defines the main classes found in the GDFR classification system. It is intended to indicate broadly the type of content associated with a format.
  </dc:description>
</owl:Class>

...and http://udfr.org/docs/onto/

escowles commented 9 years ago

@ruebot: Right, udfrs:GenreFacetType is a class -- and the terms here are all instances of it. So I think that makes them NamedIndividuals in OWL. This wasn't explicit before, but they could have been:

<owl:NamedIndividual rdf:about="http://pcdm.org/file-format-types#Archive">
  <rdf:type rdf:resource="http://www.udfr.org/onto#GenreFacetType"/>
   ...
</owl:NamedIndividual>

So, should we leave them as individuals? Or should they be classes that subclass udfrs:GenreFacetType? Maybe @azaroth42 or @acoburn have opinions?

ruebot commented 9 years ago

@escowles oof. I totally misinterpreted that :grimacing:

acoburn commented 9 years ago

As currently defined, these are definitely individuals. For instance, see the OWL guide. @escowles is correct in his example above.

However, when I look at other similar vocabularies (e.g. DCMIType), theses sorts of entities are defined as classes. So I'm somewhat inclined to follow that pattern (though there may be another pattern suggesting otherwise).

In terms of using this vocabulary (as it currently stands), am I correct that one might express this:

<my-resource> a pcdm:File, pcdmuse:OriginalFile ;
    dc:type pcdmformat:Dataset .

as opposed to this:

<my-resource> a pcdm:File, pcdmuse:OriginalFile, pcdmformat:Dataset .

My understanding of the Class vs. Individual distinction is that an Individual is one particular thing. For example, lit:JohnMilton or planet:Jupiter, as opposed to a "generic type": lit:Author or planet:Jovian. And so it would follow that e.g. a Dataset is a type of thing (i.e. a rdfs:Class) rather than a particular thing (owl:NamedIndividual). However, one could also argue that a Dataset is a particular GenreFormat (and hence an owl:NamedIndividual rather than an rdfs:Class).

That is to say, I could go either way but am inclined toward defining them as classes because it seems other similar vocabs do that. Do @azaroth42 or @barmintor have an opinion?

escowles commented 9 years ago

I just checked the other vocabs, and DCMIType, MARC Resources, Nepomuk and Pronom define their terms as classes, and AAT and UDFRS define them as individuals.

I agree with Aaron: I could go either way, but these terms to seem more like categories, so maybe converting them to rdfs:Classes makes more sense.

escowles commented 9 years ago

I've updated this PR to make the entities RDFS Classes instead of named individuals.

ruebot commented 9 years ago

:+1:

acoburn commented 9 years ago

:+1: (non-binding)

barmintor commented 9 years ago

I'm not sure about this. If you get into the notion/category debate, I think you're getting into an overly expansive ontology of class. I'd ask, for example, whether you expect these Things to be the object of rdf:type, or of (for example) dc:format. EDIT: I see @acoburn is ahead of me!

barmintor commented 9 years ago

That said, I'm very interested to see what Rob's opinion is.

escowles commented 9 years ago

@barmintor I was definitely thinking these would be used with dc:format. Does that argue against defining them as classes? The DMCIType terms are defined as classes, which seem like the canonical terms to use with dc:format: http://dublincore.org/2012/06/14/dctype.rdf

acoburn commented 9 years ago

FWIW, I was also planning to use these with dc:format.

barmintor commented 9 years ago

I'm not digging in my heels, I only want to make sure we're not conflating semantic contexts here. If we're going to follow the DC practice here, we should probably remove the "<rdfs:subClassOf rdf:resource=\"http://www.udfr.org/onto#GenreFacetType\" />" statements.

ruebot commented 9 years ago

pings @azaroth42

azaroth42 commented 9 years ago

:-1: to both making them classes and using dc:format.

If the pattern is:

_:x a pcdm-ext:Archive ;

Then I'm okay with a class. But having classes as the object of dc:format seems very strange. What would the instances of the class be?

barmintor commented 9 years ago

@azaroth42 I am reading this as "-1 to making them classes while using dc:format", and not "-1 to classes; -1 to dc:format". Is that correct?

azaroth42 commented 9 years ago

Yes...

:-1: to ?x dc:format ?y . ?y a rdfs:Class .

But I'm fine with either ?x a ?y . or ?x dc[terms]:format ?y .

Happy to hear arguments as to why it should be a class though?

escowles commented 9 years ago

@azaroth42: I think instances of the classes would be fully-specified file formats (e.g. TIFF 6.0 would be an instance of #RasterImage). I think the vocabs were referencing here are split between whether their terms are classes or individuals, though DC/DCMIType definitely envisions using dc:format with DCMIType classes.

azaroth42 commented 9 years ago

Some worked examples might help?

escowles commented 9 years ago

@azaroth42: I would expect the typical use to be something like:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ebucore: <http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix pcdm: <http://pcdm.org/models#> .
@prefix pcdmfmt: <http://pcdm.org#file-format-type#> .
@prefix pcdmuse: <http://pcdm.org/use#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema> .

</object1/files/file1> a pcdm:File, ldp:NonRDFSource, pcdmuse:ServiceFile;
  dc:format pcdmfmt:Video;
  ebucore:fileSize "12345678"^^xsd:long;
  ebucore:filename "movie.mp4" .

</object1/files/file2> a pcdm:File, ldp:NonRDFSource, pcdmuse:ThumbnailImage;
  dc:format pcdmfmt:RasterImage;
  ebucore:fileSize "5678"^^xsd:long;
  ebucore:filename "thumbnail.jpg" .

</object1/files/file3> a pcdm:File, ldp:NonRDFSource, pcdmuse:ExtractedText;
  dc:format pcdmfmt:UnstructuredText;
  ebucore:fileSize "1234"^^xsd:long;
  ebucore:filename "fulltext.txt" .

</object1/files/file4> a pcdm:File, ldp:NonRDFSource, pcdmuse:Transcript;
  dc:format pcdmfmt:HTML;
  ebucore:fileSize "1234"^^xsd:long;
  ebucore:filename "transcript.html" .

But you could also define individuals if you wanted to record a specific format for some reason:

</object1/files/file5> a pcdm:File, ldp:NonRDFSource, pcdmuse:OriginalFile;
  dc:format pcdmfmt:Video, </formats/VideoFormat1>;
  ebucore:fileSize "123456789"^^xsd:long;
  ebucore:filename "movie.vid" .

</formats/VideoFormat1> a pcdmfmt:Video;
  dc:title "Video Format #1" .
jpstroop commented 9 years ago

But having classes as the object of dc:format seems very strange.

Yes.

If I'm understanding the concern correctly, I think it comes down to: "This thing is a this format" vs. "This thing is of this format", which is a pretty subtle distinction.

To me, an instance of a postcard is not an instance of a format. It might be a resource that has characteristics in common with other things that are also of this dc:format, but, from a practical perspective, you can only use that fact to contextualize it among other resources (based on their dc:formats) or maybe trigger certain behaviors in an application. You can't use the object of dct:format to constrain, e.g., the rdfs:range or rdfs:domain of a resource, so what does making it a class get us? If anything, by not making the object of dct:format a class, the distinction between rdf:type and our intentions for dct:format becomes clearer.

azaroth42 commented 9 years ago

Thanks for the example @escowles! I'm still :-1: to using both classes and instances of those classes as the object of dc:format. The video File doesn't have a format which is the class of video formats ... it has a particular format. The video File (OTOH) is a Video. Having a class for use (which is context specific) but a mix of class and instance for format (which is not context specific) doesn't fill me with happiness.

I agree with @jpstroop: The video file is-a Video. It is-in-a/has-a format, which is-a Format.

escowles commented 9 years ago

I think I understand the reasoning for using individuals instead of classes here: dc:format should point to a concrete instance not a class, and using classes will lead to confusion with the File Use Vocab.

I'm happy to revert to using udfrs:GenreFacetType instead of rdfs:Class. But the existing rdfs:subClassOf statements should probably be changed to something else: skos:broader makes the most sense to me, given the skos:exactMatch/skos:closeMatch we're already using.

ruebot commented 9 years ago

@escowles :+1: -- I like the use of skos:broader

ruebot commented 9 years ago

I'm gonna tag some new committers to see if we can get some movement on this:

@daniel-dgi @no-reply @kestlund

kestlund commented 9 years ago

@escowles @ruebot Catching up on this discussion... :+1: to 'skos:broader' ; I had been indifferent to 'udfrs:GenreFacetType' but if it resolves the arguments, then it certainly is worth keeping.

Are there any other outstanding issues or just looking for additional consensus?

DiegoPino commented 9 years ago

:+1: for skos semantic relations, but also i think it would be correct to define that every instance is also of rdf:type -> skos:Concept Since skos:broader, exact match, etc work on skos:Concept individuals.

Lastly, just a functional idea (wish), it could be useful to add an owl:imports for udfrs. It's a practical need when using pcdm ontologies in applications like protege. No need to import skos because udfrs already does this.

escowles commented 9 years ago

@kestlund I think we're just making sure we've got consensus here.

@DiegoPino: I agree it would be good to define the terms as skos:Concepts, since we're using the SKOS predicates to link them. I'm not sure about importing UDFRS -- is that just for the udfrs:GenreFacetType definition?

DiegoPino commented 9 years ago

@escowlesthe idea of importing is just functional. We are creating individuals from an external ontology defined classes. So i thought it may be a good idea to import them, but don't worry, just a wish based on one of my personal use case and maybe out off scope (so no intention to add this topics to this particular conversation):

Personal use case:

I have been trying to deal and understand the strange/modal (strange for me, i'm sure there is a need, but i'm not aware clearly) mix in PCDM of rdfs and owl worlds and doing some local research using Protege to see how well all those different ontologies + ldp + PCDM play together. I have seen some comments here in the issues post about owl being a complicated beast to handle but i still see some parts of owl are being used(thats the modal part), and being my own experience the opposite( like the beautiful idea of having ObjectProperty and DatatypeProperty as different properties) and also not fully understanding how jumping from rdf to owl affects this, i usually pass PCDM ontologies through Protege. So said that, without imports it makes testing very complicated.

escowles commented 9 years ago

I've added rdf:type statements to the terms to make them skos:Concepts, and rebased to squash and resolve conflicts with the updated rdfs2html stylesheet.

@DiegoPino: I haven't included an owl:imports declaration, since that seems like a separate issue to me. Can you create another ticket for that? It seems like there is a broader discussion of OWL/RDFS, compatibility with tools, etc. that we should have.

DiegoPino commented 9 years ago

@escowles: thanks, don't worry about the owl:imports, it's just a good practice if creating new individuals from external defined classes. But I will create a new ticket for that because i'm having some issues dealing with this strange (strange for me…long discussion) mix of owl and rdfs use when trying to validate and do some interoperation with PCDM + LDP in protege

ruebot commented 9 years ago

@duraspace/pdcm-committers shall we review/vote on this again since we new commits from @escowles?

ruebot commented 9 years ago

@duraspace/pdcm-committers bump :sweat:

kestlund commented 9 years ago

+1

azaroth42 commented 9 years ago

:ok_hand: ... This isn't how I would do it, but as I'm not doing it and it's not core, I have no technical objections.

FWIW, the approach that I have seen taken most often is to use classes and rdf:type, such as:

jpstroop commented 9 years ago

:+1: