Conal-Tuohy / oceania

A Linked Open Data aggregator for cultural heritage metadata in New Zealand and Australia
http://oceania.digital/
1 stars 0 forks source link

Filter URIs that are utterly bogus #5

Open Conal-Tuohy opened 7 years ago

Conal-Tuohy commented 7 years ago

Some of DigitalNZ's URL values are utterly bogus and should be ignored, e.g. this JavaScript fragment is given as a thumbnail-url:

(function(w) {
                   w['_sv'] = {trackingCode: 'nEPXYcRQZrNZryvWXwDIQOQRWTSgbCEF'};
                   var s = document.createElement('script');
                   s.src = '//api.survicate.com/assets/survicate.js';
                   s.async = true;
                   var e = document.getElementsByTagName('script')[0];
                   e.parentNode.insertBefore(s, e);
                 })(window);

from http://oceania.digital/fuseki/oceania/data?graph=tag%3Aoceania.digital%2C2017%3A36555397

Conal-Tuohy commented 7 years ago

This free text statement is given as a URL:

Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. Details obtained from http://www.biomedcentral.com/about/license

                http://www.sherpa.ac.uk/romeo/issn/1471-2458/

from http://oceania.digital/fuseki/oceania/data?graph=tag%3Aoceania.digital%2C2017%3A35795197

Conal-Tuohy commented 7 years ago
The contents of The Portal to Texas History (digital content including images, text, and sound and video recordings) are made publicly available by the collection-holding partners for use in research, teaching, and private study. For the full terms of use, see https://texashistory.unt.edu/terms-of-use

from http://oceania.digital/fuseki/oceania/data?graph=tag%3Aoceania.digital%2C2017%3A35800503

Conal-Tuohy commented 7 years ago

These bogus values could be recognised in a pre-processing step, and the container elements decorated with an attribute containing a hash of the text. Then the RDF conversion could mint a URI based on the hash, and attach the text as a property of that resource.

Conal-Tuohy commented 7 years ago

... or they could just be encoded as data: URIs