SusanBrown commented 6 years ago

We need HuViz to render proper labels for terms that are not part of our own ontology, but for which the ontology is referenced at the top of the data file.

I've asked whether we could include these in our data but we really can't as those entities don't exist within our dataset and so we have nothing to which we can attached properties

Possible Solution

Go to the external source and try to grab either rdfs:label or foaf:name for that entity. This should capture most cases as they are very widely used. If not then use the last string in the URI as a label, e.g. http://vocab.getty.edu/page/cona/700006364 would be "labelled" 700006364. Users will be able to click through to the web page for the entity so they can get more information by using the snippet/triple inspector box.

Summary of techniques for CWRC ontologies

prefix	Negotiable Content	Special Knowledge extension	Notes
`loc:`	`.rdf`	`.skos.nt`
`geonames:`	no	no	custom API
`schema:`	yes	`.ttl`	`rdfs:label` provides uninteresting values, skip
`gvp:`	no	`.ttl` sometimes	`.nt` download blocked by CORS
`dbpedia:`			accessible through LDF
`wikidata:`			direct access blocked by CORS policy

Use Statistics

prefix	# in CWRC ontology	in datasets
`loc:`	85
`geonames:`
`schema:`	41
`gvp:`	0
`dbpedia:`	370

How many times each prefix is used in the CWRC ontology itself ( http://sparql.cwrc.ca/ontology/cwrc.ttl ) pertains to name lookup in so far as how it might inform the pickers and display of associated nodes in the graph, if that is turned on.

Implemented now in HuViz

[x] loc: https://id.loc.gov/search/
[x] geonames:
[x] viaf http://viaf.org/viaf/data/
[x] dbpedia http://dbpedia.org/sparql
[x] gvp: http://vocab.getty.edu/queries (union list of artist names AND places)
[ ] wikidata: https://query.wikidata.org/
[ ] oclc:
~~schema:~~ irrelevant because there are no labels in that ontology

Generic Content-Negotation

Simple content-negotiation should be implemented which could do the following, in order:

tell server we Accept various semantic and structured formats and process them, such as:
[ ] .nt ie application/n-triples
[ ] .ttl ie text/turtle
[ ] .jsonld
[ ] .rdf
failing that, accept the html and extract
[ ] RDFa and microformats
[ ] title tag contents

Entities to discover and display the names of

[x] subjects
[x] objects
[ ] classes in Class Selector
[ ] predicates in Edges of the Selected Nodes

smurp commented 5 years ago

This looks like it might be the way to extract titles out of Getty:

http://vocab.getty.edu/doc/queries/#Smart_Resource_Title

smurp commented 5 years ago

\@prefix loc:

curl -v -H "Accept: application/turtle, application/rdf+xml, text/x-turtle, */*" http://id.loc.gov/authorities/subjects/sh85054037

< HTTP/1.1 303 SEE OTHER
< Date: Wed, 21 Nov 2018 23:39:28 GMT
< Content-Length: 0
< Connection: keep-alive
< Set-Cookie: __cfduid=d0c754fb2c9d210ad9cd9040e23d879171542843568; expires=Thu, 21-Nov-19 23:39:28 GMT; path=/; domain=.loc.gov; HttpOnly
< Location: http://id.loc.gov/authorities/subjects/sh85054037.rdf
< Vary: Accept
< X-URI: http://id.loc.gov/authorities/subjects/sh85054037
< X-PrefLabel: Geology
< X-Varnish: 4962956
< Age: 0
< Via: 1.1 varnish-v4
< Access-Control-Allow-Origin: *
< Server: cloudflare
< CF-RAY: 47d6feee96092d53-TXL

which yields a redirection to:

Location: http://id.loc.gov/authorities/subjects/sh85054037.rdf

Which in turn ought to be parsable to discover some sort of useful label.

use the *.skos.nt

http://id.loc.gov/authorities/subjects/sh85054037.skos.nt

It is great that this exists, but it is unclear what to include in the request Accepts: header to be informed of the existence of this file. Thus to use this technique for obtaining information from the LOC it would be necessary to exploit the special knowledge of the existence of the .skos.nt and that is inferior to a pure content negotiation approach.

Here is somebody else pointing out that content negotiation does not cough up *.skos.nt but that it is available.

https://listserv.loc.gov/cgi-bin/wa?A2=ID;95cabb2f.1506

They suggest a way to content negotiate skos.xml though:

curl -L -H 'Accept: application/skos+xml' http://id.loc.gov/authorities/subjects/sh00000011

use the X-PrefLabel header

Note the header line:

< X-PrefLabel: Geology

This appears to be equivalent to the skos:prefLabel value available in the .skos.nt file above except for the fact that it is missing any language indicator @en. The LOC does not appear to offer non-english labelling anyway, so no loss.

pros

no special knowledge required
very high performance: a useful response on the initial hit
no need for content negotiation

cons

no access to alternate language version, but those appear to be minimally available at LOC anyway
FATAL CON: It turns out that response headers are not visible on redirects, so this is a hard NO
here is the security justification: https://fetch.spec.whatwg.org/#atomic-http-redirect-handling

What was implemented?

Fetching the .skos.nt file.

smurp commented 5 years ago

Prefixes from http://sparql.cwrc.ca/ontology/cwrc.ttl

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://sparql.cwrc.ca/ontologies/cwrc#> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix cidoc: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix eurovoc: <http://eurovoc.europa.eu/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geonames: <http://sws.geonames.org/> .
@prefix gvp: <http://vocab.getty.edu/ontology#> .
@prefix loc: <http://id.loc.gov/vocabulary/relators/> .
@prefix ii: <http://sparql.cwrc.ca/ontologies/ii#> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix org: <http://www.w3.org/ns/org#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sem: <http://semanticweb.cs.vu.nl/2009/11/sem/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix voaf: <http://purl.org/vocommons/voaf#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

smurp commented 5 years ago

PRIORITY

Susan emphasizes that effort to get geonames labels ought to be the priority because of upcoming conference needs.

DUE FOR DEC 9th

https://github.com/cwrc/HuViz/wiki/Upcoming-Events

smurp commented 5 years ago

@Prefix geonames:

Objectives

to translate geonameid to a preferred name
to have selected language versions available
to provide the name service to many users per day, eg at least 20
to provide the name service for many lookups per user per day, eg at least 500

Using the Official Geonames API

http://www.geonames.org/export/index.html

To look up 2657896 one would hit:

http://api.geonames.org/hierarchyJSON?formatted=true&geonameId=2657896&username=huviz
the path to the usable name would beobj.geonames.pop().name

Limits

the API requires a userid, with usage limits
30,000 credits (ie hits) per day
2,000 credits (ie hits) per hour

problems

using SPARQL one user could easily hit the 2,000/hr limit in 5 or 10 minutes (even less)
a class of 10 students could easily hit the 2,000/hr limit in 5 minutes of use
even 5 users tried to use SPARQL to see geonames the 30,000/day limit could easily be hit
one user being unlucky in their recursive searching could reach the 2,000 limit in under 2 minutes
the hourly and daily limits bring the geoname lookup service down for all users

low-impact remediations

local caching in each user's browser (we would want this anyway for performance, irrespective of prefix) #220 -- costs: trivial disk burden
use peer-to-peer LOD to share name caches among users #221
only look up names when the label for a node is actually asked for by the user (not preemptively) -- costs: performance, ugly name visibly being replaced by pretty name

high user-impact remediations

set up per-user geonames accounts (encourage user to set up account, present their userid at query-time)
govern usage: eg 5 names per-minute per-user

Set up alternate server, using SPARQL

The geonames data is available as an HDT ( http://www.rdfhdt.org/datasets/ ) file, making setting up a very high performance SPARQL endpoint relatively easy. It would be prudent to limit access to this database to HuViz users, otherwise it might be DDoSed by other geonames enthusiasts.

pros:

is standards-based
is high-performance
be part of a pattern of our hosting other LOD

cons:

we (who-ever that is: Guelph?, Nooron?) bears load and maintenance costs
depending on popularity might be a strategic complexity

Set up a custom server

Relevant names data could be extracted from the alternameNamesV2.zip file and housed in a custom database which could be used to host geonames and possibly other problematic name sources as they present themselves.

pros:

satisfies our users

cons:

is not terribly scalable
is relatively costly of effort
is not standards-based

@SusanBrown @antimony27 @wolfmaul

smurp commented 5 years ago

@Prefix schema:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

schema:appearance rdfs:subPropertyOf schema:workExample .

schema:firstAppearance rdfs:subPropertyOf schema:workExample .

schema:exampleOfWork schema:inverseOf schema:workExample .

schema:workExample a rdf:Property ;
    rdfs:label "workExample" ;
    dct:source <http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_bibex> ;
    schema:domainIncludes schema:CreativeWork ;
    schema:inverseOf schema:exampleOfWork ;
    schema:rangeIncludes schema:CreativeWork ;
    schema:sameAs <https://schema.org/workExample> ;
    rdfs:comment "Example/instance/realization/derivation of the concept of this creative work. eg. The paperback edition, first edition, or eBook." .

SusanBrown commented 5 years ago

Responding to @smurp

I am concerned that we'll be swamping any geonames account limit rather quickly and so propose we proceed with this use of an account-based lookup against the geonames api, but that we prioritize the in-browser cache (useful for many reasons) so each browser only every looks up a name once, later using their own cache for the name.

We'll need the browser-local quadstore for data editing too...

Yes, good to be thinking through these things.

I think we have a limited download of geonames (just the Cdn ones I think) stored in our servers for the CWRC lookups for this reason. Would it potentially work for this purpose too? If we added British placenames to the Cdn that would cover the bulk of existing data. I'm pinging @ilovan and @jefferya here.

The browser-caching and local quadstore sounds very good both for this and for the editing.

jefferya commented 5 years ago

Conceivable, yes. I think the local geonames endpoint contains all cities and countries given the size. Ideally,

I'd like to see the endpoint moved to VM dedicated to serving content instead of bolted on to the CWRC repository (and impacting CWRC Repo site performance). Geonames takes up 53MB or 94% of the Drupal database dump.

smurp commented 5 years ago

@Prefix `geonames:`

I have introduced a measure of protection of the system from enclobbering by the 2000/hr and/or 30,000/day lookup limits on Geoname lookups. The way it works is that a user must enter (on the Settings screen in the "Geonames Username" field) the username they wish to use for lookups. As soon as the name is entered then lookup of geonames is triggered. The username "huviz" may be entered but that is a shared resource which can be easily exhausted. The label for the settings field contains a link to the /login page at geonames.org at which one can trivially set up their own username. This approach should make it possible for any motivated user to get good service and for all of us to have access to light, shared service easily.

smurp commented 5 years ago

@SusanBrown Could you give feedback on the labelling and help to prioritize content negotation (ie generic) vs the prefix-specific techniques such as API implementations as needed for Getty, which -- for example -- will require connecting via SPARQL and performing getty-specific queries. What is the priority of this work vs other things? And which particular prefixs matter most to you?

SusanBrown commented 5 years ago

Geonames and LOC were the most prominent in the data by far. @antimony27 do you have any thoughts?

SusanBrown commented 5 years ago

I'm thinking in the longer term we'll definitely want dbpedia, wikidata, and Getty.

Do the W3C ontologies, e.g. org, owl, prov etc. not have a standard means of doing this that would make it possible to come up with a generic solution for them?

smurp commented 5 years ago

I haven't implemented the generic "content negotiation" method yet because it didn't work for the priority sources which required fairly custom approaches (GeoNames and LOC). So yes, the question is what's the priority sequence of Generic and the custom methods. I take from this that dealing with generic content-negotiation is a fine next step.

SusanBrown commented 5 years ago

Can you say at this point how many would be knocked off by generic content negotiation? I think we’ve knocked off the single most important ones at this point—just trying to weigh generic vs dbpedia/Getty and hoping for input from Kim as well.

On Dec 11, 2018, at 9:54 AM, Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:

I haven't implemented the generic "content negotiation" method yet because it didn't work for the priority sources which required fairly custom approaches (GeoNames and LOC). So yes, the question is what's the priority sequence of Generic and the custom methods. I take from this that dealing with generic content-negotiation is a fine next step.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446121762, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoB6RimT-OPHvxKIfKDn6r0wSd730ks5u33LcgaJpZM4XWfw5.

smurp commented 5 years ago

The ideal way to approach this question of priority of implementation would be to know:

how many of these nameless entities we are dealing with and
what their distribution is across these prefixes and
then to do an analysis of the which of them are going to require which style of name discovery.

Only the last step needs to be done for the sake of implementation and hence is the only step that isn't busywork (which distracts us all from the creation of deliverables.)

I'd suggest we do the following:

implement content-negotiation
implement HTML title scraping
see how many names we're missing in datasets you're caring about
implement a framework for using SPARQL to look up names if that is needed by some prefixes (as I believe it will be, eg: gvp:, dbpedia: ie Getty, DBpedia )
populate the internal mapping between prefixes which can be looked up by SPARQL and the particular SPARQL configuration required by that endpoint. It turns out there is some unavoidable detail in this matter, completely generic it is not.
Then deal with remaining name sources and their custom APIs as impact dictates

This is a never-ending task which will gradually get easier as our generic techniques grow in power and the industry evolves toward broader adoption of best practices.

Note the new "Nameless" Set in the Set Picker.

smurp commented 5 years ago

Hmm. Interesting opportunity. The data we get back from Geonames.org consists of all the name, lat/long, jurisdiction type and other details of the whole hierarchy of places which the original named place is contained in. I'm seeing depths of 6 or so in this hierarchy, bottoming out with the very Earth itself. It would be a rather trivial matter to generate nodes and simple edges representing the geographical containership of these entities. The consequence would be named nodes popping onto the shelf (Berlin, Brandenburg, Germany, Europe, Earth) so that all the "geonodes" would end up connected in a skein of edges mirroring their real-world containership.

Cheap (an hour?)
cool
yielding triples that the triple inspector would properly show as mappable links into geonames.org.
[ ] Yes?

SusanBrown commented 5 years ago

So if I understand correctly this would both add more generalized (but not more particularized) nodes to the shelf and a bunch of more general nodes (such as Europe) that would allow people to do things with them that would grab the objects from lower in the hierarchy (e.g. Berlin, Italy would both be operated on by operating upon Europe). Is that right?

I think it’s probably worth doing but am nervous about 1) the addition of nodes to a graph that aren’t in the original dataset without the user’s knowledge/consent, and they might not grab all places if they come from another spatial vocabulary—and there are lots of other spatial vocabs out there that people will prefer because they reflect historical places 2) swamping larger graphs with even more nodes.

So, if my understanding is corrrect, how about we: a) make it configurable in settings and initially turned off b) wording on setting something like: “Auto-add larger Geonames groupings”

Unless this would be super fast to do, please break out as its own item, slap a rough time estimate on it, and put this in the Enhancements pile in Zenhub.

On Dec 11, 2018, at 11:56 AM, Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:

Hmm. Interesting opportunity. The data we get back from Geoname consists of all the name, lat/long, jurisdiction type and other details of the whole hierarchy of places which the original named place is contained in. I'm seeing depths of 6 or so in this hierarchy, bottoming out with the very Earth itself. It would be a rather trivial matter to generate nodes and simple edges representing the geographical containership of these entities. The consequence would be named nodes popping onto the shelf (Berlin, Brandenburg, Germany, Europe, Earth) so that all the "geonodes" would end up connected in a skein of edges mirroring their real-world containership. Cheap, cool, yielding triples that the triple inspector would properly show as mappable links into geonames.orghttp://geonames.org. Yes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446160352, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoBEEtQsB5NWNVdzZsemh8xp7q01oks5u349FgaJpZM4XWfw5.

smurp commented 5 years ago

You have described it correctly @SusanBrown

Exactly what I was thinking. I'll stick this behind a default off setting. My guess is that it is a quick one (an hour or less) so if it doesn't fall fast like that I'll stop and wrap an issue around it for later.

SusanBrown commented 5 years ago

Perfect. If you can do it that quickly it would be really interesting to see how it works. But if not let’s shelf for now.

On Dec 11, 2018, at 5:47 PM, Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:

You have described it correctly @SusanBrownhttps://github.com/SusanBrown

Exactly what I was thinking. I'll stick this behind a default off setting. My guess is that it is a quick one (an hour or less) so if it doesn't fall fast like that I'll stop and wrap an issue around it for later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446274328, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoNPC5sFQhyPD9pvJ26tXRwVDlYMzks5u3-G7gaJpZM4XWfw5.

smurp commented 5 years ago

smurp/huviz@0dbe1a and smurp/huviz@abc0ed0 released to alpha.

Look in settings for

GeoNames Greedily
GeoNames Deeply
GeoNames Limit.

Deeply acquired GeoNames nodes are schema:Place instances

antimony27 commented 5 years ago

I can get the Greedily working - that shows the population, if I'm right? My screen is too small to actually allow me to click on Deeply. What does it do?

On Thu, Dec 13, 2018 at 3:30 PM Shawn Murphy notifications@github.com wrote:

smurp/huviz@0dbe1a and smurp/huviz@abc0ed0 https://github.com/smurp/huviz/commit/abc0ed0 released to alpha.

Look in settings for GeoNames Greedily and GeoNames Deeply settings. There is also a setting now for GeoNames Limit

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cwrc/HuViz/issues/180#issuecomment-446987836, or mute the thread https://github.com/notifications/unsubscribe-auth/AK9kRVaEZFSNZYD-vcI4OcHP1lcuWI_Oks5u4mRtgaJpZM4XWfw5 .

-- Kim Martin Michael Ridley Postdoctoral Fellow in Digital Humanities Co-Founder, The MakerBus Collaborative Secretary, Canadian Society for Digital Humanities College of Arts University of Guelph MacKinnon Building Rm 1001 Phone: (519) 824-4120 ex. 58245 Twitter: @antimony27

SusanBrown commented 5 years ago

For documentation:

Greedily grabs all properties of the place Deeply situates it in the geonames hierarchy

On Dec 13, 2018, at 9:40 AM, antimony27 notifications@github.com<mailto:notifications@github.com> wrote:

I can get the Greedily working - that shows the population, if I'm right? My screen is too small to actually allow me to click on Deeply. What does it do?

On Thu, Dec 13, 2018 at 3:30 PM Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:

smurp/huviz@0dbe1a and smurp/huviz@abc0ed0 https://github.com/smurp/huviz/commit/abc0ed0 released to alpha.

Look in settings for GeoNames Greedily and GeoNames Deeply settings. There is also a setting now for GeoNames Limit

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cwrc/HuViz/issues/180#issuecomment-446987836, or mute the thread https://github.com/notifications/unsubscribe-auth/AK9kRVaEZFSNZYD-vcI4OcHP1lcuWI_Oks5u4mRtgaJpZM4XWfw5 .

-- Kim Martin Michael Ridley Postdoctoral Fellow in Digital Humanities Co-Founder, The MakerBus Collaborative Secretary, Canadian Society for Digital Humanities College of Arts University of Guelph MacKinnon Building Rm 1001 Phone: (519) 824-4120 ex. 58245 Twitter: @antimony27

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446991226, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoE4uVKvp9Off8TG-y66jzHDlZzLEks5u4mbDgaJpZM4XWfw5.

smurp commented 5 years ago

@wolf Could you look at the CSS for the Settings to deal with Susan's inability to scroll down to the bottom on a smaller screen?

smurp commented 5 years ago

@SusanBrown

Deeply means that HuViz should create synthetic nodes (ie which are coming from Geonames, not the dataset) for all the containing geographic entities which Geonames reports to us when we look up a Geoname. When one looks up a geoname programmatically one gets about 6 successively general geonames records which contextualize the sought geoname – these all end up bottoming out at the level of The Earth itself. So the impact in HuViz is that there are all these nicely connected nodes which show us that The Bronx is in several levels of New York (city, ???, state), then the USA then North America, then The Earth.
Greedily means that for each level in Geonames a best effort is made to treat as much metadata as possible as triples for: population, other names, lat and long, etc. I think I've turned down the greediness here and suppressed the lat/long for the moment. Press HERE for resumption of lat/long service..... :-)

smurp commented 5 years ago

It appears that http://schema.org response perfectly to pure content negotiation. In other words, when told to return turtle it happily does so:

curl -L -H "Accept: text/turtle" https://schema.org/worksFor

@prefix schema: <http://schema.org/> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dctype: <http://purl.org/dc/dcmitype/> .
@prefix eli: <http://data.europa.eu/eli/ontology#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix snomed: <http://purl.bioontology.org/ontology/SNOMEDCT/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix xsd1: <hhttp://www.w3.org/2001/XMLSchema#> .

schema:worksFor a rdf:Property ;
    rdfs:label "worksFor" ;
    schema:domainIncludes schema:Person ;
    schema:rangeIncludes schema:Organization ;
    rdfs:comment "Organizations that the person works for." .

That is great, but notice that the value of rdfs:label is "worksFor" which is what HuViz would have displayed for this predicate anyway. In other words, if the scope of our ambition at the moment is just to find prettier names, schema.org appears to not have any to give us. Surfing around the site reveals no place in the automatically generated content where a pretty name is offered which looks any different than the id itself.

Bottom line: there appears to be no point in hitting schema.org for names. Though if and when we want to spelunk around schema.org for ontological context it looks like it will be easy to do so.

smurp commented 5 years ago

Hmm. Not sure why this query is not working

PREFIX gvp: <http://vocab.getty.edu/page/cona/> 
select * {
  gvp:700006364 ?pred ?obj}

http://vocab.getty.edu/queries

smurp commented 5 years ago

@susan @antimony27 OK, it looks like the current set of prefixes for the CWRC ontology does not include http://vocab.getty.edu/ so I have put work on that front on hold, pending review.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://sparql.cwrc.ca/ontologies/cwrc#> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix eurovoc: <http://eurovoc.europa.eu/> .
@prefix geonames: <http://sws.geonames.org/> .
@prefix loc: <http://id.loc.gov/vocabulary/relators/> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix voaf: <http://purl.org/vocommons/voaf#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

antimony27 commented 5 years ago

@smurp I imagine this is likely because we use definitions from the Getty vocabulary in our own definitions of terms, but do not use predicates, so we don't need to include the prefixes. @SusanBrown can you confirm?

smurp commented 5 years ago

https://twitter.com/RubenVerborgh/status/811506068257984512

Hmm. Federated LDF access to WikiData, DBPedia and VIAF.

smurp commented 5 years ago

AFAICT direct access to wikidata.org appears to be blocked by CORS

https://phabricator.wikimedia.org/T46994

I am investigating to discover whether the linked data fragments server mentioned above is so restricted. If it more open minded then it would address WikiData, DBPedia and VIAF requirements.

smurp commented 5 years ago

DBPedia and VIAF name lookup now working, based on Linked Data Fragments.

https://github.com/smurp/huviz/commit/baf6057017c215e92d15d90d4881d4514c2dac4c

smurp commented 5 years ago

See a demo of name lookup at:

http://alpha.huviz.dev.nooron.com/#load+/data/name_lookup_demo.ttl+with+/data/owl_mini.ttl

It is demonstrating:

loc:
viaf:
dbpedia:

smurp commented 5 years ago

Consider using LOV APIs if LOV is useful enough.

https://lov.linkeddata.es/dataset/lov/api

It might be just the thing for looking up ontological terms for the class and predicate pickers.....

smurp commented 5 years ago

WikiData

WikiData has CORS policies which block direct access by user's browsers. Therefore it is looking like access to WikiData is going to be tricky without implementing one of three expedients:

A proxy system on our the server side so our user's queries get routed through our server using a web proxy
We could mirror WikiData on the CWRC SPARQL system
We could deploy a Linked-Data Fragments server using HDT snaps of WikiData (and other things)

OCLC

I have not yet found addressable versions of OCLC content, meaning that for access to it we'd be limited to:

Mirror OCLC on CWRC SPARQL
Deploy an LDF server

Linked Open Vocabularies

This has been added as one of our SPARQL endpoints and as a name lookup source. It contains 600+ LOD ontologies so should be a great source for a huge amount of stuff. Further, they accept suggestions, so this might be a very important pathway going forward. This will be more impressive in HuViz once I've got Class and Predicate names being updated properly!

To see LOV in action look on http://alpha.huviz.dev.nooron.com/ under SPARQL – Public Endpoints – Linked Open Vocabularies. Be sure to examine the Graphs menu (now sorted and with labels) to see the 600+ ontologies. By the way, this suggest the use-case of *Show All or equivalent when you don't know what to search for to get started.

cwrc / HuViz

labels for entities external to the ontology #180

Possible Solution

Summary of techniques for CWRC ontologies

Use Statistics

Implemented now in HuViz

Generic Content-Negotation

Entities to discover and display the names of

\@prefix loc:

use the *.skos.nt

use the X-PrefLabel header

pros

cons

What was implemented?

Prefixes from http://sparql.cwrc.ca/ontology/cwrc.ttl

PRIORITY

DUE FOR DEC 9th

Objectives

Using the Official Geonames API

Limits

problems

low-impact remediations

high user-impact remediations

Set up alternate server, using SPARQL

pros:

cons:

Set up a custom server

pros:

cons:

@Prefix `geonames:`

WikiData

OCLC

Linked Open Vocabularies