bbc / simorgh

The BBC's Open Source Web Application. Contributions welcome! Used on some of our biggest websites, e.g.
https://www.bbc.com/thai
Other
1.39k stars 220 forks source link

Investigation: how to implement AV structured data #4566

Closed lizcameron closed 4 years ago

lizcameron commented 4 years ago

Is your feature request related to a problem? Please describe. With the introduction of AV to the article page, we want to introduce video and audio-specific schema.org data in a attempt to make the metadata of these pages richer. This involves including VideoObject and AudioObject schema for article pages.

Describe the solution you'd like An article page with a media block should contain schema.org metadata that specifies the audio and/or video objects present on the page. If possible, we would like to distinguish between audio object and video object.

See Jira ticket/ask @lizcameron for more examples of the correct implementation of video object structured data.

Video object detail:

Name Example Where does this come from?
type VideoObject  
name "This is the title of the video" model.versions[0].title
description "This is the description of the video." API?
thumbnailUrl https://ichef.bbci.co.uk/images/ic/$recipe/p0715z8q.jpg  
uploadDate 2019-10-10T09:00:00+08:00  
duration PT2M21S model.versions[0].durationISO8601
contentURL   Player link/file location (based on player, we can only use the embed point)
embedURL externalEmbedUrl":"https:\/\/www.bbc.co.uk\/news\/av\/embed\/p07t1hp1\/50299906

Audio object detail:

Name Example Where does this come from?
type audioObject  
name "This is the title of the audio" model.versions[0].title
description "This is the description of the audio." API?
thumbnailUrl https://ichef.bbci.co.uk/images/ic/$recipe/p0715z8q.jpg  
uploadDate 2019-10-10T09:00:00+08:00  
duration PT2M21S model.versions[0].durationISO8601
contentURL   Player link/file location (based on player, we can only use the embed point)
embedURL externalEmbedUrl":"https:\/\/www.bbc.co.uk\/news\/av\/embed\/p07t1hp1\/50299906

Following investigation, create issues to implement schema.org for video and audio player.

We will also want to identify testing requirements and any technical issues that we'll need to overcome to be able to implement AV structured data as described within this issue.

Off the back of this investigation, a developer will sit with business and a tester to discuss an appropriate way forward.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Testing notes

Additional context Add any other context or screenshots about the feature request here.

rhenshaw56 commented 4 years ago

To achieve this, we would need to make slight modifications to the media player metadata helper.

Basically, a single metadata object for each piece of matching audio/video content on a page would need to be returned from that helper, assuming there are multiple audio/video contents.

For each metadata object returned, we also need to set a property of '@context': 'http://schema.org', to ensure that it is picked up as a valid schema.org schema (see more...), since it will be a single JSON-LD object.

Returning a valid AudioObject/VideoObject ensures we don't have to set the @listContent property like we currently do as these gets aggregated and distinguished by the validation tool.

In a nut shell, the change to the helper would be like the snippet below:

const mediaPlayerMetadata = blocks => {
  const aresMediaBlocks = pathOr(null, ['model', 'blocks'], blocks);

  if (!aresMediaBlocks || aresMediaBlocks.length < 1) {
    return null;
  }

  const aresMetadataBlocks = aresMediaBlocks.filter(
    block => block.type === 'aresMediaMetadata',
  );

  const metadataBlock = aresMetadataBlocks[0];

  const format = pathOr(null, ['model', 'format'], metadataBlock);
  const type = format === 'audio' ? 'AudioObject' : 'VideoObject';

  const metadata = {
    '@context': 'http://schema.org',
    '@type': type,
    name: pathOr(null, ['model', 'title'], metadataBlock),
    description: pathOr(null, ['model', 'synopses', 'short'], metadataBlock),
    duration: pathOr(
      null,
      ['model', 'versions', [0], 'duration'],
      metadataBlock,
    ),
    thumbnailUrl: getThumbnailUri(metadataBlock),
    uploadDate: pathOr(
      null,
      ['model', 'versions', [0], 'availableFrom'],
      metadataBlock,
    ),
  };

  return metadata;
};

making this change and using the tool to validate this page (http://localhost:7080/news/articles/c3wmq4d1y3wo), should give something that looks like this Screen Shot 2019-11-13 at 20 53 30

We also need to make changes to most of the properties we're setting, as you can see, warnings/errors are shown when viewing the data for a VideoObject

Screen Shot 2019-11-13 at 21 18 36

Unfortunately, the duration and uploadDate properties from the aresmetadata data are given as numbers in seconds and would have to use a valid ISO 8601 date format which is in model.versions[0].durationISO8601.

Additional properties that would need to be added like contentURL and embedURL, we would need to pass the value of the embedSource generated here as a prop down to the Metadata component

jamesdonoh commented 4 years ago

Thanks for the investigation @rhenshaw56. The approach and pseudocode look reasonable to me, though I think we'd want to break the mediaPlayerMetadata function down into smaller units during implementation.

One question - if we output separate AudioObjects/VideoObjects does the validator still understand that these are all part of the same Article? i.e. is a hierarchical relationship between them still preserved? Is this how CNN/others handle it?

rhenshaw56 commented 4 years ago

One question - if we output separate AudioObjects/VideoObjects does the validator still understand that these are all part of the same Article? i.e. is a hierarchical relationship between them still preserved? Is this how CNN/others handle it?

@jamesdonoh yes, a hierarchical relationship between them is still preserved and the validator understands how to output and distinguish a list of various AudioObjects/VideoObjects

simonsinclair commented 4 years ago

This is a good investigation, @rhenshaw56.

Can I suggest we:

@jamesdonoh Is this how CNN/others handle it?

CNN

<script type="application/ld+json"name="metaScript">{ 
   "@context":"https://schema.org",
   "@type":"VideoObject",
   "name":"CNN asks Zelensky about investigation claims",
   "description":"Ukrainian President Volodymyr Zelensky weighed in on claims that he was &lt;a href=&quot;http://www.cnn.com/2019/11/19/politics/volodymyr-zelensky-burisma-probe-intl/index.html&quot; target=&quot;_blank&quot;&gt;ready to announce an investigation into Burisma Holdings,&lt;/a&gt; a Ukrainian energy company linked to the son of former Vice President Joe Biden, following a phone call with President Donald Trump.",
   "thumbnailURL":"https://cdn.cnn.com/cnnnext/dam/assets/191001131044-02-zelensky-1001-large-169.jpg",
   "image":"https://cdn.cnn.com/cnnnext/dam/assets/191001131044-02-zelensky-1001-large-169.jpg",
   "duration":"PT1M15S",
   "uploadDate":"2019-11-19T14:16:31Z",
   "contentUrl":"https://edition.cnn.com/videos/politics/2019/11/19/volodymyr-zelensky-burisma-investigation-allegation-donald-trump-impeachment-hearings-pleitgen-liveshot-intl-ldn-vpx.cnn",
   "url":"https://edition.cnn.com/videos/politics/2019/11/19/volodymyr-zelensky-burisma-investigation-allegation-donald-trump-impeachment-hearings-pleitgen-liveshot-intl-ldn-vpx.cnn",
   "embedUrl":"https://fave.api.cnn.io/v1/fav/?video=politics/2019/11/19/volodymyr-zelensky-burisma-investigation-allegation-donald-trump-impeachment-hearings-pleitgen-liveshot-intl-ldn-vpx.cnn&customer=cnn&edition=international&env=prod"
}</script>
HarveyPeachey commented 4 years ago

Do we need to include the expires property as well? I know that on our mega av test article page we have a video that has an expiry date.

rhenshaw56 commented 4 years ago

Do we need to include the expires property as well?

@HarveyPeachey I think we would need @lizcameron's input on that

rhenshaw56 commented 4 years ago

Create a separate function that serves the purpose of decorating the video object with these properties. This would return a valid Schema that can be tested. An example of a similar function can be found here.

@simonsinclair sorry, I don't understand what this is doing, could you maybe put it in code in the context of the metadata usage.

Use Ramda path where the default is to return null/undefined

Also regarding this, I believe we use R.pathOr extensively in simorgh, so it's fine as it is and we would prefer forcing to have null values to undefined (or a mixture of both) which R.path returns if the property specified is not on the object

jamesdonoh commented 4 years ago

Thanks for the reviews @simonsinclair @HarveyPeachey - I think we can worry about low-level details like which library functions to use when we do the implementation.

@rhenshaw56 please chase up Harvey's comment about expires and add this and Simon's suggestion about a decorator function as notes on the implementation ticket. Am closing this for now as the investigation is done.

12 commented 4 years ago

May I suggest we do something like below to avoid having multiple <script type="application/ld+json"> tags on the page? I think this would look a lot tidier, and I don't think would require too much manipulation of the current metadata helper.

<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@graph":
    [
        {
            "@type": "VideoObject",
            "description": "foo",
            "name": "Hello",
            "thumbnailUrl": "http://foo.com/img.png",
            "uploadDate": "2019-08-08"
        },
        {
            "@type": "VideoObject",
            "description": "foo",
            "name": "Hello",
            "thumbnailUrl": "http://foo.com/img.png",
            "uploadDate": "2019-08-08"
        },
        {
            "@type": "AudioObject",
            "description": "foo",
            "name": "Hello",
            "thumbnailUrl": "http://foo.com/img.png",
            "uploadDate": "2019-08-08"
        }
    ]
}
</script>

This is valid markup according to Google's Structured Data Testing Tool

Source

rhenshaw56 commented 4 years ago

May I suggest we do something like below to avoid having multiple <script type="application/ld+json"> tags on the page? I think this would look a lot tidier, and I don't think would require too much manipulation of the current metadata helper.

<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@graph":
    [
      {
            "@type": "VideoObject",
            "description": "foo",
            "name": "Hello",
            "thumbnailUrl": "http://foo.com/img.png",
            "uploadDate": "2019-08-08"
        },
      {
            "@type": "VideoObject",
            "description": "foo",
            "name": "Hello",
            "thumbnailUrl": "http://foo.com/img.png",
            "uploadDate": "2019-08-08"
        },
        {
            "@type": "AudioObject",
            "description": "foo",
            "name": "Hello",
            "thumbnailUrl": "http://foo.com/img.png",
            "uploadDate": "2019-08-08"
        }
    ]
}
</script>

This is valid markup according to Google's Structured Data Testing Tool

Source

thanks @12, this looks good but we would be doing this under the assumption that a list of Audio and Video metadata objects would be passed into the Metadata component which is ideal if we have multiple Audio/Video content on the page, but that's not the case as it is because the Metadata used in the mediaplayer would be rendered multiple times for each Audio/Video content on a page each having it's own aresMediaMetada object. I haven't actually seen a case where we have a list of media metadata all in one place