Open SusanBrown opened 6 years ago
This looks like it might be the way to extract titles out of Getty:
curl -v -H "Accept: application/turtle, application/rdf+xml, text/x-turtle, */*" http://id.loc.gov/authorities/subjects/sh85054037
< HTTP/1.1 303 SEE OTHER
< Date: Wed, 21 Nov 2018 23:39:28 GMT
< Content-Length: 0
< Connection: keep-alive
< Set-Cookie: __cfduid=d0c754fb2c9d210ad9cd9040e23d879171542843568; expires=Thu, 21-Nov-19 23:39:28 GMT; path=/; domain=.loc.gov; HttpOnly
< Location: http://id.loc.gov/authorities/subjects/sh85054037.rdf
< Vary: Accept
< X-URI: http://id.loc.gov/authorities/subjects/sh85054037
< X-PrefLabel: Geology
< X-Varnish: 4962956
< Age: 0
< Via: 1.1 varnish-v4
< Access-Control-Allow-Origin: *
< Server: cloudflare
< CF-RAY: 47d6feee96092d53-TXL
which yields a redirection to:
Location: http://id.loc.gov/authorities/subjects/sh85054037.rdf
Which in turn ought to be parsable to discover some sort of useful label.
http://id.loc.gov/authorities/subjects/sh85054037.skos.nt
It is great that this exists, but it is unclear what to include in the request Accepts:
header to be informed of the existence of this file. Thus to use this technique for obtaining information from the LOC it would be necessary to exploit the special knowledge of the existence of the .skos.nt and that is inferior to a pure content negotiation approach.
Here is somebody else pointing out that content negotiation does not cough up *.skos.nt but that it is available.
https://listserv.loc.gov/cgi-bin/wa?A2=ID;95cabb2f.1506
They suggest a way to content negotiate skos.xml though:
curl -L -H 'Accept: application/skos+xml' http://id.loc.gov/authorities/subjects/sh00000011
Note the header line:
< X-PrefLabel: Geology
This appears to be equivalent to the skos:prefLabel
value available in the .skos.nt
file above except for the fact that it is missing any language indicator @en
. The LOC does not appear to offer non-english labelling anyway, so no loss.
Fetching the .skos.nt
file.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://sparql.cwrc.ca/ontologies/cwrc#> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix cidoc: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix eurovoc: <http://eurovoc.europa.eu/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geonames: <http://sws.geonames.org/> .
@prefix gvp: <http://vocab.getty.edu/ontology#> .
@prefix loc: <http://id.loc.gov/vocabulary/relators/> .
@prefix ii: <http://sparql.cwrc.ca/ontologies/ii#> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix org: <http://www.w3.org/ns/org#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sem: <http://semanticweb.cs.vu.nl/2009/11/sem/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix voaf: <http://purl.org/vocommons/voaf#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
Susan emphasizes that effort to get geonames
labels ought to be the priority because of upcoming conference needs.
@Prefix geonames:
geonameid
to a preferred name
http://www.geonames.org/export/index.html
To look up 2657896
one would hit:
obj.geonames.pop().name
The geonames data is available as an HDT ( http://www.rdfhdt.org/datasets/ ) file, making setting up a very high performance SPARQL endpoint relatively easy. It would be prudent to limit access to this database to HuViz users, otherwise it might be DDoSed by other geonames enthusiasts.
Relevant names data could be extracted from the alternameNamesV2.zip
file and housed in a custom database which could be used to host geonames and possibly other problematic name sources as they present themselves.
@SusanBrown @antimony27 @wolfmaul
@Prefix schema:
@prefix dct: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
schema:appearance rdfs:subPropertyOf schema:workExample .
schema:firstAppearance rdfs:subPropertyOf schema:workExample .
schema:exampleOfWork schema:inverseOf schema:workExample .
schema:workExample a rdf:Property ;
rdfs:label "workExample" ;
dct:source <http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_bibex> ;
schema:domainIncludes schema:CreativeWork ;
schema:inverseOf schema:exampleOfWork ;
schema:rangeIncludes schema:CreativeWork ;
schema:sameAs <https://schema.org/workExample> ;
rdfs:comment "Example/instance/realization/derivation of the concept of this creative work. eg. The paperback edition, first edition, or eBook." .
Responding to @smurp
I am concerned that we'll be swamping any geonames account limit rather quickly and so propose we proceed with this use of an account-based lookup against the geonames api, but that we prioritize the in-browser cache (useful for many reasons) so each browser only every looks up a name once, later using their own cache for the name.
We'll need the browser-local quadstore for data editing too...
Yes, good to be thinking through these things.
I think we have a limited download of geonames (just the Cdn ones I think) stored in our servers for the CWRC lookups for this reason. Would it potentially work for this purpose too? If we added British placenames to the Cdn that would cover the bulk of existing data. I'm pinging @ilovan and @jefferya here.
The browser-caching and local quadstore sounds very good both for this and for the editing.
Conceivable, yes. I think the local geonames endpoint contains all cities and countries given the size. Ideally,
I'd like to see the endpoint moved to VM dedicated to serving content instead of bolted on to the CWRC repository (and impacting CWRC Repo site performance). Geonames takes up 53MB or 94% of the Drupal database dump.
geonames:
I have introduced a measure of protection of the system from enclobbering by the 2000/hr and/or 30,000/day lookup limits on Geoname lookups. The way it works is that a user must enter (on the Settings screen in the "Geonames Username" field) the username they wish to use for lookups. As soon as the name is entered then lookup of geonames is triggered. The username "huviz" may be entered but that is a shared resource which can be easily exhausted. The label for the settings field contains a link to the /login page at geonames.org at which one can trivially set up their own username. This approach should make it possible for any motivated user to get good service and for all of us to have access to light, shared service easily.
@SusanBrown Could you give feedback on the labelling and help to prioritize content negotation (ie generic) vs the prefix-specific techniques such as API implementations as needed for Getty, which -- for example -- will require connecting via SPARQL and performing getty-specific queries. What is the priority of this work vs other things? And which particular prefixs matter most to you?
Geonames and LOC were the most prominent in the data by far. @antimony27 do you have any thoughts?
I'm thinking in the longer term we'll definitely want dbpedia, wikidata, and Getty.
Do the W3C ontologies, e.g. org, owl, prov etc. not have a standard means of doing this that would make it possible to come up with a generic solution for them?
I haven't implemented the generic "content negotiation" method yet because it didn't work for the priority sources which required fairly custom approaches (GeoNames and LOC). So yes, the question is what's the priority sequence of Generic and the custom methods. I take from this that dealing with generic content-negotiation is a fine next step.
Can you say at this point how many would be knocked off by generic content negotiation? I think we’ve knocked off the single most important ones at this point—just trying to weigh generic vs dbpedia/Getty and hoping for input from Kim as well.
On Dec 11, 2018, at 9:54 AM, Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:
I haven't implemented the generic "content negotiation" method yet because it didn't work for the priority sources which required fairly custom approaches (GeoNames and LOC). So yes, the question is what's the priority sequence of Generic and the custom methods. I take from this that dealing with generic content-negotiation is a fine next step.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446121762, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoB6RimT-OPHvxKIfKDn6r0wSd730ks5u33LcgaJpZM4XWfw5.
The ideal way to approach this question of priority of implementation would be to know:
Only the last step needs to be done for the sake of implementation and hence is the only step that isn't busywork (which distracts us all from the creation of deliverables.)
I'd suggest we do the following:
gvp:
, dbpedia:
ie Getty, DBpedia )This is a never-ending task which will gradually get easier as our generic techniques grow in power and the industry evolves toward broader adoption of best practices.
Note the new "Nameless" Set in the Set Picker.
Hmm. Interesting opportunity. The data we get back from Geonames.org consists of all the name, lat/long, jurisdiction type and other details of the whole hierarchy of places which the original named place is contained in. I'm seeing depths of 6 or so in this hierarchy, bottoming out with the very Earth itself. It would be a rather trivial matter to generate nodes and simple edges representing the geographical containership of these entities. The consequence would be named nodes popping onto the shelf (Berlin, Brandenburg, Germany, Europe, Earth) so that all the "geonodes" would end up connected in a skein of edges mirroring their real-world containership.
So if I understand correctly this would both add more generalized (but not more particularized) nodes to the shelf and a bunch of more general nodes (such as Europe) that would allow people to do things with them that would grab the objects from lower in the hierarchy (e.g. Berlin, Italy would both be operated on by operating upon Europe). Is that right?
I think it’s probably worth doing but am nervous about 1) the addition of nodes to a graph that aren’t in the original dataset without the user’s knowledge/consent, and they might not grab all places if they come from another spatial vocabulary—and there are lots of other spatial vocabs out there that people will prefer because they reflect historical places 2) swamping larger graphs with even more nodes.
So, if my understanding is corrrect, how about we: a) make it configurable in settings and initially turned off b) wording on setting something like: “Auto-add larger Geonames groupings”
Unless this would be super fast to do, please break out as its own item, slap a rough time estimate on it, and put this in the Enhancements pile in Zenhub.
On Dec 11, 2018, at 11:56 AM, Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:
Hmm. Interesting opportunity. The data we get back from Geoname consists of all the name, lat/long, jurisdiction type and other details of the whole hierarchy of places which the original named place is contained in. I'm seeing depths of 6 or so in this hierarchy, bottoming out with the very Earth itself. It would be a rather trivial matter to generate nodes and simple edges representing the geographical containership of these entities. The consequence would be named nodes popping onto the shelf (Berlin, Brandenburg, Germany, Europe, Earth) so that all the "geonodes" would end up connected in a skein of edges mirroring their real-world containership. Cheap, cool, yielding triples that the triple inspector would properly show as mappable links into geonames.orghttp://geonames.org. Yes?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446160352, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoBEEtQsB5NWNVdzZsemh8xp7q01oks5u349FgaJpZM4XWfw5.
You have described it correctly @SusanBrown
Exactly what I was thinking. I'll stick this behind a default off setting. My guess is that it is a quick one (an hour or less) so if it doesn't fall fast like that I'll stop and wrap an issue around it for later.
Perfect. If you can do it that quickly it would be really interesting to see how it works. But if not let’s shelf for now.
On Dec 11, 2018, at 5:47 PM, Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:
You have described it correctly @SusanBrownhttps://github.com/SusanBrown
Exactly what I was thinking. I'll stick this behind a default off setting. My guess is that it is a quick one (an hour or less) so if it doesn't fall fast like that I'll stop and wrap an issue around it for later.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446274328, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoNPC5sFQhyPD9pvJ26tXRwVDlYMzks5u3-G7gaJpZM4XWfw5.
smurp/huviz@0dbe1a and smurp/huviz@abc0ed0 released to alpha.
Look in settings for
Deeply acquired GeoNames nodes are schema:Place instances
I can get the Greedily working - that shows the population, if I'm right? My screen is too small to actually allow me to click on Deeply. What does it do?
On Thu, Dec 13, 2018 at 3:30 PM Shawn Murphy notifications@github.com wrote:
smurp/huviz@0dbe1a and smurp/huviz@abc0ed0 https://github.com/smurp/huviz/commit/abc0ed0 released to alpha.
Look in settings for GeoNames Greedily and GeoNames Deeply settings. There is also a setting now for GeoNames Limit
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cwrc/HuViz/issues/180#issuecomment-446987836, or mute the thread https://github.com/notifications/unsubscribe-auth/AK9kRVaEZFSNZYD-vcI4OcHP1lcuWI_Oks5u4mRtgaJpZM4XWfw5 .
-- Kim Martin Michael Ridley Postdoctoral Fellow in Digital Humanities Co-Founder, The MakerBus Collaborative Secretary, Canadian Society for Digital Humanities College of Arts University of Guelph MacKinnon Building Rm 1001 Phone: (519) 824-4120 ex. 58245 Twitter: @antimony27
For documentation:
Greedily grabs all properties of the place Deeply situates it in the geonames hierarchy
On Dec 13, 2018, at 9:40 AM, antimony27 notifications@github.com<mailto:notifications@github.com> wrote:
I can get the Greedily working - that shows the population, if I'm right? My screen is too small to actually allow me to click on Deeply. What does it do?
On Thu, Dec 13, 2018 at 3:30 PM Shawn Murphy notifications@github.com<mailto:notifications@github.com> wrote:
smurp/huviz@0dbe1a and smurp/huviz@abc0ed0 https://github.com/smurp/huviz/commit/abc0ed0 released to alpha.
Look in settings for GeoNames Greedily and GeoNames Deeply settings. There is also a setting now for GeoNames Limit
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cwrc/HuViz/issues/180#issuecomment-446987836, or mute the thread https://github.com/notifications/unsubscribe-auth/AK9kRVaEZFSNZYD-vcI4OcHP1lcuWI_Oks5u4mRtgaJpZM4XWfw5 .
-- Kim Martin Michael Ridley Postdoctoral Fellow in Digital Humanities Co-Founder, The MakerBus Collaborative Secretary, Canadian Society for Digital Humanities College of Arts University of Guelph MacKinnon Building Rm 1001 Phone: (519) 824-4120 ex. 58245 Twitter: @antimony27
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cwrc/HuViz/issues/180#issuecomment-446991226, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAhUoE4uVKvp9Off8TG-y66jzHDlZzLEks5u4mbDgaJpZM4XWfw5.
@wolf Could you look at the CSS for the Settings to deal with Susan's inability to scroll down to the bottom on a smaller screen?
@SusanBrown
Deeply
means that HuViz should create synthetic nodes (ie which are coming from Geonames, not the dataset) for all the containing geographic entities which Geonames reports to us when we look up a Geoname. When one looks up a geoname programmatically one gets about 6 successively general geonames records which contextualize the sought geoname – these all end up bottoming out at the level of The Earth itself. So the impact in HuViz is that there are all these nicely connected nodes which show us that The Bronx is in several levels of New York (city, ???, state), then the USA then North America, then The Earth.Greedily
means that for each level in Geonames a best effort is made to treat as much metadata as possible as triples for: population, other names, lat and long, etc. I think I've turned down the greediness here and suppressed the lat/long for the moment. Press HERE
for resumption of lat/long service..... :-)It appears that http://schema.org response perfectly to pure content negotiation. In other words, when told to return turtle it happily does so:
curl -L -H "Accept: text/turtle" https://schema.org/worksFor
@prefix schema: <http://schema.org/> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dctype: <http://purl.org/dc/dcmitype/> .
@prefix eli: <http://data.europa.eu/eli/ontology#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix snomed: <http://purl.bioontology.org/ontology/SNOMEDCT/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix xsd1: <hhttp://www.w3.org/2001/XMLSchema#> .
schema:worksFor a rdf:Property ;
rdfs:label "worksFor" ;
schema:domainIncludes schema:Person ;
schema:rangeIncludes schema:Organization ;
rdfs:comment "Organizations that the person works for." .
That is great, but notice that the value of rdfs:label
is "worksFor"
which is what HuViz would have displayed for this predicate anyway. In other words, if the scope of our ambition at the moment is just to find prettier names, schema.org appears to not have any to give us. Surfing around the site reveals no place in the automatically generated content where a pretty name is offered which looks any different than the id itself.
Bottom line: there appears to be no point in hitting schema.org for names. Though if and when we want to spelunk around schema.org for ontological context it looks like it will be easy to do so.
Hmm. Not sure why this query is not working
PREFIX gvp: <http://vocab.getty.edu/page/cona/>
select * {
gvp:700006364 ?pred ?obj}
@susan @antimony27 OK, it looks like the current set of prefixes for the CWRC ontology does not include http://vocab.getty.edu/ so I have put work on that front on hold, pending review.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://sparql.cwrc.ca/ontologies/cwrc#> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix eurovoc: <http://eurovoc.europa.eu/> .
@prefix geonames: <http://sws.geonames.org/> .
@prefix loc: <http://id.loc.gov/vocabulary/relators/> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix voaf: <http://purl.org/vocommons/voaf#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@smurp I imagine this is likely because we use definitions from the Getty vocabulary in our own definitions of terms, but do not use predicates, so we don't need to include the prefixes. @SusanBrown can you confirm?
https://twitter.com/RubenVerborgh/status/811506068257984512
Hmm. Federated LDF access to WikiData, DBPedia and VIAF.
AFAICT direct access to wikidata.org appears to be blocked by CORS
I am investigating to discover whether the linked data fragments server mentioned above is so restricted. If it more open minded then it would address WikiData, DBPedia and VIAF requirements.
DBPedia and VIAF name lookup now working, based on Linked Data Fragments.
https://github.com/smurp/huviz/commit/baf6057017c215e92d15d90d4881d4514c2dac4c
See a demo of name lookup at:
http://alpha.huviz.dev.nooron.com/#load+/data/name_lookup_demo.ttl+with+/data/owl_mini.ttl
It is demonstrating:
Consider using LOV APIs if LOV is useful enough.
https://lov.linkeddata.es/dataset/lov/api
It might be just the thing for looking up ontological terms for the class and predicate pickers.....
WikiData has CORS policies which block direct access by user's browsers. Therefore it is looking like access to WikiData is going to be tricky without implementing one of three expedients:
I have not yet found addressable versions of OCLC content, meaning that for access to it we'd be limited to:
This has been added as one of our SPARQL endpoints and as a name lookup source. It contains 600+ LOD ontologies so should be a great source for a huge amount of stuff. Further, they accept suggestions, so this might be a very important pathway going forward. This will be more impressive in HuViz once I've got Class and Predicate names being updated properly!
To see LOV in action look on http://alpha.huviz.dev.nooron.com/ under SPARQL
– Public Endpoints
– Linked Open Vocabularies
. Be sure to examine the Graphs menu (now sorted and with labels) to see the 600+ ontologies. By the way, this suggest the use-case of *Show All or equivalent when you don't know what to search for to get started.
We need HuViz to render proper labels for terms that are not part of our own ontology, but for which the ontology is referenced at the top of the data file.
I've asked whether we could include these in our data but we really can't as those entities don't exist within our dataset and so we have nothing to which we can attached properties
Possible Solution
Go to the external source and try to grab either rdfs:label or foaf:name for that entity. This should capture most cases as they are very widely used. If not then use the last string in the URI as a label, e.g. http://vocab.getty.edu/page/cona/700006364 would be "labelled" 700006364. Users will be able to click through to the web page for the entity so they can get more information by using the snippet/triple inspector box.
Summary of techniques for CWRC ontologies
loc:
.rdf
.skos.nt
geonames:
schema:
.ttl
rdfs:label
provides uninteresting values, skipgvp:
.ttl
sometimes.nt
download blocked by CORSdbpedia:
wikidata:
Use Statistics
loc:
geonames:
schema:
gvp:
dbpedia:
Implemented now in HuViz
loc:
https://id.loc.gov/search/geonames:
viaf
http://viaf.org/viaf/data/dbpedia
http://dbpedia.org/sparqlgvp:
http://vocab.getty.edu/queries (union list of artist names AND places)wikidata:
https://query.wikidata.org/oclc:
irrelevant because there are no labels in that ontologyschema:
Generic Content-Negotation
Simple content-negotiation should be implemented which could do the following, in order:
Accept
various semantic and structured formats and process them, such as:.nt
ieapplication/n-triples
.ttl
ietext/turtle
.jsonld
.rdf
title
tag contentsEntities to discover and display the names of