HTML: @type or @language

azaroth42 commented 10 years ago

Putting the fixtures into the JSON-LD playground, our HTML solution doesn't actually work:

"Invalid JSON-LD syntax; an element containing \"@value\" may not contain both \"@type\" and \"@language\"

Which is ... suboptimal. Playground link: http://json-ld.org/playground/index.html#startTab=tab-expanded&json-ld=http%3A%2F%2Fiiif.io%2Fapi%2Fpresentation%2F2.0%2Fexample%2Ffixtures%2F64%2Fmanifest.json

azaroth42 commented 10 years ago

Damnations, this is a real bug :( http://www.w3.org/TR/json-ld/#value-objects

azaroth42 commented 10 years ago

Wrote to linked-json to ask for advice, but some off the top of my head possible solutions:

Drop the @type, and require that if a @value starts with < and ends with > then it MUST be HTML.
Drop the @language, and put language in the HTML using xml:lang, following the pattern described here: http://www.idpf.org/epub/oa/#h.fbvcg1ft34rp
Break compatibility with strings and do something like 6.6.2 with ContentAsText ... but then the range of the property is all messed up -- sometimes it's a resource and sometimes it's a literal.

jpstroop commented 10 years ago

Initial reaction is that @type is more important than @language here, so suggestion 2 (xml:lang) seems like the cleanest, I think.

-Js

Sent via mobile. Please excuse typos, brevity, etc.

azaroth42 commented 10 years ago

That was my initial reaction too, but on reflection I prefer dropping @type for the following reasons:

language is unlikely to repeat, whereas all values are likely to be in either HTML or plaintext. This makes @language more useful as a discriminator for which values to use. Otherwise, you have to parse the language out of the XML for all values just to throw all but one away. That's very inefficient.
keeping @language is internally consistent with other uses of literals. We only type literals in the context, not in the data. Especially as HTML is only usable in limited numbers of fields.

R

zimeon commented 10 years ago

It is worth noting that what we are trying to do with having a type of XMLLiteral and a language is also not valid in RDF. Language may be applied only to stings: http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal and in this sense the JSON-LD seems just to be being consistent with RDF.

I agree with Rob that an application is most likely to want to select based on language (and will likely simply assume the type of the value). Since we are not talking about how to validate this, and we aren't defining an accurate type either (what we actually have is a subset of xthml in there), I agree with dropping @type.

(The alternative might be to have a language map http://www.w3.org/TR/json-ld/#dfn-language-map but that seems overkill)

mikeapp commented 10 years ago

+1 on retaining @language and dropping @type

jpstroop commented 10 years ago

So a client will have to test the string to see if it's html? We'll need to point that out. Otherwise, OK, for the sake of consistency.

Sent via mobile (on the PA turnpike). Please excuse typos and brevity.

azaroth42 commented 10 years ago

Yes. And I think we should be as strict as possible with what that test should be. The minimum (IMO) should be value[0] == '<' and value[-1] == '>'. If you want a regular string to start and end in an angle bracket, then put a space at the end such that value[-2] =='>' and value[-1] == ' '

I don't think we can require a particular tag, as and

are both good options for different situations.

azaroth42 commented 10 years ago

A non-RDF but possible JSON-LD based approach is @index: http://www.w3.org/TR/json-ld/#data-indexing

The language structure disappears in the RDF version, which seems undesirable from the perspective of people wanting to use frames or triplestores. Also, I think it would force languages to be used even when there's only one value, also undesirable.

So not to say we shouldn't use the current change, just to point out another option.

azaroth42 commented 10 years ago

Antoine: As discussed today the HTML mark-up does create problems if used with the language. I would push again the solution someone made that if a piece of HTML has to be language-tagged, it should be in the HTML itself, using xml:lang. Removing the @type seems dangerous: it is quite important to allow client to spot that a given value includes presentation instructions that they might not be equipped to render (be it HTML). In fact putting such presentation mark-up in descriptive values seems not a great idea. I know this is a presentation API, but still it feels awkward because it's supposed to be data for presentation, not presentation instructions.

azaroth42 commented 10 years ago

Antoine: I understand your point about the difficulty to parse HTML to get the language - especially that this language may be just partially applied (ie. to some bit of the HTML mark-up)...

I wonder whether your use case would work really badly from the pattern suggested at http://lists.w3.org/Archives/Public/public-linked-json/2014Aug/0034.html

[This is the ( rdf:value "..." ; dc:language "en" ) bnode option]

Or if you introduced a pointer from the manifest to a more complex presentation-focused HTML page - the equivalent of see_also, but for human-readable mark-up.

jpstroop commented 10 years ago

I think I tend to agree with Antoine. I never completely got comfortable with dropping @type, and guess I would rather have deal with the expense of having to parse XHTML when I'm told to (by @type), than to have to test every label, just in case it might be XHTML.

I guess the bnode option is an option...but to me that's even more complex and different from anything else we've done.

azaroth42 commented 10 years ago

I could live with it, but it seems more expensive to parse the XML than to sniff the contents.

For the record I'm against the bnodes solution as it means punning properties which are impossible (afaik) in json-ld. Or mandating that every string be in a bnode which is even worse.

jpstroop commented 10 years ago

More expensive, but less of a hack, IMO. And since XML has a way of indicating language, it feels like we should just follow the standards.

How does Javascript (in browsers) do with XML namespaces? Would the fact that it's xml:lang cause any problems?

I'm against the bnode option too.

azaroth42 commented 10 years ago

Currently:

JS, AI: @type
RS, MA, SW, CC, KC, BA, TCrane: @language

@zimeon any further comments on the issue?

jpstroop commented 10 years ago

Not too late to change your votes! (:smile: :beers:)

mikeapp commented 10 years ago

Still voting to keep @language :)

azaroth42 commented 10 years ago

I think it's cooked. Even if Stanford only gets one vote, it's still pretty clear :)

jpstroop commented 10 years ago

@jpstroop admits defeat :crying_cat_face:

azaroth42 commented 10 years ago

Closing! (won't un-fix)

IIIF / api

HTML: @type or @language #335