eprintsug / json-ld

Export plugin to transform selected metadata to json linked data to provide structured data for search engine indexing.
MIT License
2 stars 0 forks source link

Google Search Console: Missing field 'license' #2

Open photomedia opened 5 years ago

photomedia commented 5 years ago

Google Search Console has been processing our json-ld and recently pointed out that the dataset document type is missing a license field. A dataset can include multiple files in EPrints, with different licenses set on each, so we have to figure out the best way to include that information in the eprint-level json-ld.

photomedia commented 5 years ago

It's not obvious from the documentation, but I think this should work:

"distribution":[
     {
        "@type":"DataDownload",
        "encodingFormat":"for example: CSV",
        "contentUrl":"URL to the File to download",
        "license":"URL to the license file for this file"
     },
     {
        "@type":"DataDownload",
        "encodingFormat":"for example: XML",
        "contentUrl":"URL to the File to download",
         "license":"URL to the license file for this file"
     }
  ],
 "license":"URL to the general/default license for the repository"
photomedia commented 5 years ago

Here is how I ended up dealing with the license fields for our repo:

        #Add rights/license info
    my %rightsList;

    my $repo = $plugin->{session}->get_repository;

    foreach my $doc ( $eprint->get_all_documents() ) {

        my %docrights;
        my ($license_uri,$license_phrase);  

        my $license = "term_access"; #default for all documents is Spectrum Terms of Access
        if ($doc->exists_and_set("license")){$license = $doc->get_value("license");}

        $docrights{'contentUrl'}="";
        $docrights{'license'}="";
        $docrights{'encodingFormat'}="";

        $license_uri = $repo->phrase("licenses_uri_$license");
                $license_phrase = $repo->phrase("licenses_typename_$license");
        if($doc->exists_and_set("date_embargo")){
                $license_phrase .= "(".$repo->phrase("embargoed_until", embargo_date=>$doc->value("date_embargo")).")";
        }

         #Google doesn't recognize license_phrase, so add it to license field
        $docrights{'contentUrl'}=$doc->get_url;
        $docrights{'license'}= $license_uri." ".$license_phrase; #merge license-URI with license name and embargo information
        $docrights{'@type'}="DataDownload";
        $docrights{'encodingFormat'}=$doc->get_type;
        if ($doc->exists_and_set("mime_type")) {$docrights{'encodingFormat'}=$doc->value("mime_type");}

        push @{$jsonldata{distribution}}, \%docrights;

        }
        my %reporights;
        $reporights{'name'}="";
        $reporights{'url'}="";

        $reporights{'name'}=$repo->phrase("licenses_typename_term_access");
        $reporights{'url'}=$repo->phrase("licenses_uri_term_access");
        $reporights{'@type'}="CreativeWork";

        push @{$jsonldata{license}}, \%reporights;

This gives us a valid result for our repo. In our case, there is an overall "Terms of Access" for the repository, and this is what is listed as the "license" field for all items by default:

 "license":[ 
       { 
          "name":"Spectrum Terms of Access",
          "url":"https://spectrum.library.concordia.ca/policies.html#TermsOfAccess",
          "@type":"CreativeWork"
       }

However, I also include the "distribution" field, which lists each file with its license information, including URI of the license, the name of it, and the embargo information if there is any. For example:

  "distribution":[ 
       { 
          "license":"http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons: Attribution-Share Alike 3.0",
          "@type":"DataDownload",
          "contentUrl":"https://spectrum.library.concordia.ca/7724/3/2004-08-03_BNQ_Final.xls",
          "encodingFormat":"application/vnd.ms-excel"
       },
       { 
          "@type":"DataDownload",
          "encodingFormat":"application/pdf",
          "contentUrl":"https://spectrum.library.concordia.ca/7724/4/bnq_ListByPublisher.pdf",
          "license":"http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons: Attribution-Share Alike 3.0"
       }
 ]

I also had to add these phrases for licenses:

<!--TN 2019, October - adding license URLs and other phrases for JSON-LD -->
    <epp:phrase id="licenses_uri_cc_by_nd">http://creativecommons.org/licenses/by-nd/3.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nd_4">http://creativecommons.org/licenses/by-nd/4.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by">http://creativecommons.org/licenses/by/3.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_4">http://creativecommons.org/licenses/by/4.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nc">http://creativecommons.org/licenses/by-nc/3.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nc_4">http://creativecommons.org/licenses/by-nc/4.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nc_nd">http://creativecommons.org/licenses/by-nc-nd/3.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nc_nd_4">http://creativecommons.org/licenses/by-nc-nd/4.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nc_sa">http://creativecommons.org/licenses/by-nc-sa/3.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_nc_sa_4">http://creativecommons.org/licenses/by-nc-sa/4.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_sa">http://creativecommons.org/licenses/by-sa/3.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_by_sa_4">http://creativecommons.org/licenses/by-sa/4.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_public_domain">http://creativecommons.org/licenses/publicdomain/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_gnu_gpl">http://creativecommons.org/licenses/GPL/2.0/</epp:phrase>
  <epp:phrase id="licenses_uri_cc_gnu_lgpl">http://creativecommons.org/licenses/LGPL/2.1/</epp:phrase>
    <epp:phrase id="licenses_uri_term_access">https://spectrum.library.concordia.ca/policies.html#TermsOfAccess</epp:phrase>
     <epp:phrase id="embargoed_until">Embargoed until <epc:pin name="embargo_date" /></epp:phrase>

What do you think? Is that a common enough structure (having an overall default terms of access statement in your repo, with additional file-level license files) to turn this solution into a pull request?