ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences
Apache License 2.0
109 stars 31 forks source link

validation problem in full.jsonld: description @value #150

Closed smrgeoinfo closed 2 years ago

smrgeoinfo commented 3 years ago

"HTML is not a known valid target type for the description property." at line 21

"description": {
    "@type": "HTML",
    "@value": "<p>"Winter ecology of larval krill: quantifying their interaction with the pack ice 
 ...   target="_blank"><em>(from cruise report LMG0205)</em></a></p>"
  },

description is expected to be text. I don't think there's any way to embed html in schema.org values?

jaygray0919 commented 3 years ago

target @Property/@WebPage/description

smrgeoinfo commented 3 years ago

@jaygray0919 ???? what???

jaygray0919 commented 3 years ago

sorry; should have been more expressive. For a given @Type (e.g. @CreativeWork) use about/@WebPage and decorate your HTML content there. If your parent @Type is not @CreativeWork use mainEntityOfPage/@CreativeWork (i.e. @WebPage) ....

smrgeoinfo commented 3 years ago

about property definition: "The subject matter of the content" In our science context, we're the subject is a schema:Dataset, so we'd have a triple 'Dataset' 'about' 'some CreativeWork' I think the intention here is closer to schema:subjectOf "A CreativeWork or Event about this Thing", with a triple that is like 'Dataset' 'subjectOf' 'CreativeWork', where 'CreativeWork' describes the dataset in some fashion. I can't find any subclasses of creative work that have properties whose value is 'html', so this would have to be a link to the resource that actually contains the html encoded content.

That said, I think the issue here is including html formatted text as a text value for a schema.org property, schema:description in this case. In the example above, the @value is just a text string so "description": "<p>"Winter ecology of larval krill: qu.....ack ice ... target="_blank"><em>(from ... LMG0205)</em></a></p>" would be valid. The problem is how to tell clients to interpret this as html and present the formatted result.

Suggestion: "description": "<p>"Winter ecology of larval krill: qu.....ack ice ... target="_blank"><em>(from ... LMG0205)</em></a></p>"^^rdf:HTML

Use RDF data type on the string value to indicate that its HTML. Note the string content still has to be properly escaped or JSON parsers will gag on it.

ashepherd commented 3 years ago

The HTML datatype is defined in the schema.org context at: https://schema.org/docs/jsonldcontext.json as the RDF datatype for HTML text.
Screen Shot 2021-02-17 at 11 51 04 AM

smrgeoinfo commented 3 years ago

@ashepherd so you think we can ignore the error from the google validation tool? I'm wondering how many clients would not successfully parse the

"description": {
    "@type": "HTML",
    "@value": "<p....."
}

construct?

smrgeoinfo commented 3 years ago

OOPs, JSON-LD doesn't work with ^^rdf:HTML data type declarations, that's a Turtle construct. Oh well.

ashepherd commented 3 years ago

IMO, I think we can safely ignore Google's validation tool in this case. I thought it was a useful exercise to demonstrate how if all you had was an HTML description of something, how you'd do it. But I don't feel strongly that it must remain.

fils commented 3 years ago

I agree we can ignore the tool on this part. However, I do think this example should remain. Though perhaps with guidance as to the implications of doing so.

While I believe it would be best if people can use plain text for the descriptions, many will use HTML markup since it is what they have. It may be coming from a CMS or other sources and is the best description they have and they not wish (for their uses) to or be able to parse the HTML from the strings when making the JSON-LD record.

It does impose some issues I have encountered with down stream uses where things need to be cleaned up for NLP or issues with expressing the data into the HTML DOM via web components or plain JS. Those are important though. I'll try and add some examples of how I've dealt with this content to this issue when I get the time.

datadavev commented 3 years ago

This snippet gets no errors in the Google tool:

{
  "@context":"https://schema.org/",
  "@type":"Dataset",
  "name":"test",
  "description": "This is a description of the test. Here's some more words to make it long enough."
}

This is exactly the same content expressed in expanded form:

[
  {
    "@type": [
      "http://schema.org/Dataset"
    ],
    "http://schema.org/description": [
      {
        "@value": "This is a description of the test. Here's some more words to make it long enough."
      }
    ],
    "http://schema.org/name": [
      {
        "@value": "test"
      }
    ]
  }
]

It fails the Google tool. Hence, my impression is that tool is not really useful for general validation purposes.

mbjones commented 2 years ago

I'm closing this issue as there seem to be no outstanding items, and we have addressed validation through other tickets, and settled the issues. If a specific issue is still outstanding, please feel free to reopen it with a description of what we need to handle.