ThreeSixtyGiving / standard

The 360Giving data standard for UK philanthropic giving
http://www.threesixtygiving.org
Other
10 stars 15 forks source link

What guidance or encouragement, if any, should we give about schema.org Dataset markup for publishers #259

Open robredpath opened 5 years ago

robredpath commented 5 years ago

In a recent Twitter conversation, we've seen the suggestion that we add schema.org Dataset markup to the dataset pages on GrantNav in order to improve discoverability. In general, GrantNav has been used to demonstrate best practice in 360Giving data, but AFAIK we don't have any guidance or documentation on how to mark up pages that link to 360Giving data, so I think we should at least form an initial view on how to make use of schema.org markup (if at all) before looking at changes to GrantNav.

One other thing to note is that Google (fairly) recently launched Dataset Search, which makes discoverability of datasets more of a mainstream thing.

Starting the conversation here, it might be more appropriate on the forum but we can move it if so!

timgdavies commented 5 years ago

I would have thought the better place to do this would be on the Registry at http://data.threesixtygiving.org/ and that should be possible with some simple template changes.

The documentation for adding markup for dataset search isn't fantastically clear, but I had a quick go with the Structured Data testing tool and adding some markup to the first entry in the http://data.threesixtygiving.org/ table to get:

<table class="table table-bordered text-center">
  <thead>
    <tr>
      <th scope="col">Organisation</th>
      <th scope="col">Title</th>
      <th scope="col">Published</th>
      <th scope="col">Valid?</th>
      <th scope="col">Period</th>
      <th scope="col">Records</th>
      <th scope="col">Total value<sup>*</sup></th>
      <th scope="col">File</th>
      <th scope="col">Licence</th>
    </tr>
  </thead>
  <tbody>

        <tr typeof="dcat:Dataset">
          <td scope="rowgroup" rowspan="1">
            <div property="dc:title"><span style="display:none;">Grants made by </span> A B Charitable Trust</div>
            <div class="mt-4">

            </div>
          </td>

            <td property="dc:description">Open Programme grants awarded from 2015 until February 2018</td>
            <td style="white-space:nowrap;">Jun '18</td>
            <td>✓</td>
            <td>
              <div style="white-space:nowrap;">Jan '13</div>
              <div>to</div>
              <div style="white-space:nowrap;">Jan '18</div>
            </td>
            <td>381</td>
            <td>

                <div>£ 4,295,400</div>

            </td>
            <td>

                <div>
                  <a rel="dcat:distribution" href="http://abcharitabletrust.org.uk/data/abct-data-february-2018.xlsx">
                    <img property="dcat:mediaType" content="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"  src="../images/files/xlsx.png" width="70" height="70">
                  </a>
                </div>
                <div class="mt-4">63.1 kB</div>

            </td>
            <td><a rel="license" href="https://creativecommons.org/licenses/by/4.0/"><img src="../images/licences/cc_by.png" width="70" height="27"></a></td>

        </tr>
        </tbody>
        </table>

which I think if you run through that tool, gives enough information for something to be picked up in the Google Dataset search. There is more meta-data that could be added here to also include information on the period covered by the data, and the publisher responsible for the data and so-on, but the above is just a quick proof of concept. It might need some conditional logic in the template for declaring the media type etc.

Note that I added some 'hidden' text (<span style="display:none;">Grants made by </span>) in order that the Dataset Search result would be clearer when accessed out of the context of the 360 Giving website.

In terms of getting publishers to put the schema markup direct on their website (which would do no harm / be good for discoverability) I think there are a couple of options:

(1) Just provide documentation on doing this with recommended markup properties;

(2) Provide a simple form-based tool where they can provide key variables about the dataset, and then return HTML to copy-and-paste into their websites with the required markup - and that links to their datasets for download;

(3) Develop a small 'widget' (Javascript? Assuming Google Search crawler compiles the JS before indexing) that they can embed on their websites to link to their data, and which takes data from the registry to display on their site.

Option (3) could also provide value-add features like linking to the different format versions of their data, automatically linking to GrantNav when their data is included in GrantNav, and updating to link to other sites that might include their data in future.

All options face the challenge of working around CMS systems most grantmakers will be running, and the extent to which they may strip out microdata or RDFa. Given that - I think adding to http://data.threesixtygiving.org/ would be the priority.