Open VladimirAlexiev opened 7 years ago
Looking at other entities (http://data.americanartcollaborative.org/page/cbm/object/197, http://data.americanartcollaborative.org/page/cbm/object/197/group_title, even http://data.americanartcollaborative.org/page/cbm/exhibition/11), I don't see a doc with foaf:primaryTopic
.
So I guess there's some leftover triples in http://data.crystalbridges.org/exhibition/10, leftovers from an abandoned design.
Nevertheless, the Branding issue remains
We are not doing anything with the exhibitions or bibliography data right now. We need to clean up the old models/triples.
On Feb 22, 2017, at 5:57 AM, Vladimir Alexiev notifications@github.com wrote:
Looking at other entities (http://data.americanartcollaborative.org/page/cbm/object/197 http://data.americanartcollaborative.org/page/cbm/object/197, http://data.americanartcollaborative.org/page/cbm/object/197/group_title http://data.americanartcollaborative.org/page/cbm/object/197/group_title, even http://data.americanartcollaborative.org/page/cbm/exhibition/11 http://data.americanartcollaborative.org/page/cbm/exhibition/11), I don't see a doc with foaf:primaryTopic. So I guess there's some leftover triples in http://data.crystalbridges.org/exhibition/10 http://data.crystalbridges.org/exhibition/10, leftovers from an abandoned design.
Nevertheless, the Branding issue remains
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/american-art/semantic-hosting/issues/4#issuecomment-281675950, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB-qcLysToXoRU4KrvHnrUJy6eB3XXHks5rfD7MgaJpZM4MIqMA.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/american-art/semantic-hosting","title":"american-art/semantic-hosting","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/american-art/semantic-hosting"}},"updates":{"snippets":[{"icon":"PERSON","message":"@VladimirAlexiev in #4: Looking at other entities (http://data.americanartcollaborative.org/page/cbm/object/197, http://data.americanartcollaborative.org/page/cbm/object/197/group_title, even http://data.americanartcollaborative.org/page/cbm/exhibition/11), I don't see a doc with
foaf:primaryTopic
. \r\nSo I guess there's some leftover triples in http://data.crystalbridges.org/exhibition/10, leftovers from an abandoned design.\r\n\r\nNevertheless, the Branding issue remains"}],"action":{"name":"View Issue","url":"https://github.com/american-art/semantic-hosting/issues/4#issuecomment-281675950"}}}
What do exhibitions have to do with this issue?
The semantic data is recorded sometimes against straight (branded URLs), eg
and other times against AAC-ified URLs, eg
This is extremely confusing and makes the task of validation through http://review.americanartcollaborative.org very hard. @workergnome, how do you deal with this?
Permanent URLs should be well-designed and follow the same policy.
The last comment is related to but not the same as https://github.com/american-art/PUAM/issues/30
I've been treating the URLs as opaque, and starting from a select ?id where {?id a crm:E22_Man-Made_Object}
to get my initial list.
I didn't define URLs, both since I figured that those patterns should be defined by the museums and I'm not sure what the limitations of Karma are. Again, I am ambivalent—I believe they URLs should be opaque, so I've been treating them like that.
(which is not to say that they're meaningless—I agree completely that there are problems with the URLs chosen, and entities that are the same should share URLs.)
URLs should be opaque in SPARQL, no doubt about (eg no slicing of URLs should ever be needed).
But by not defining them, you've allowed students to make bad mistakes
entities that are the same should share URLs
- Right: having separate title type for each instance of first name is crazy.
And also: entities that are different must have different URLs.
See my comments in https://github.com/american-art/aac_mappings/issues/48: I agree these are problems, but I'm not sure I was (or am) the right person to specify URL patterns.
@caknoblock @workergnome About the "DNS/redirect" issue that's so heavily discussed right now:
Currently http://data.crystalbridges.org/object/108 redirects to http://data.americanartcollaborative.org/cbm/object/108 (you can see this with
curl -Iv http://data.crystalbridges.org/object/108
).
This makes it diffucult for museums to deploy since they need to mess with a web server.
My basic idea is as follows. I'm not even sure that playing with Apache proxy will be needed:
data.crystalbridges.org -> 54.69.252.89
Registering a DNS record is much easier than deploying a web server/object/108
but there is also Host: data.crystalbridges.org
so Pubby knows the full URLconf:dataset
:
<> a conf:Configuration;
conf:webBase <server_base_uri>;
conf:dataset
[conf:datasetBase <http://data.crystalbridges.org/>; conf:sparqlEndpoint <http://data.crystalbridges.org/sparql>],
[conf:datasetBase <http://data.autry.edu/>; conf:sparqlEndpoint <http://data.autry.edu/sparql>].
(or we could put the same http://data.americanartcollaborative.org/sparql in all conf:sparqlEndpoint
, I don't think that'll make any difference)
That specific configuration won't work out of the box because Pubby does not consider the hostname when it constructs the request URI AFAICT, so if object/40
is present in both datasets it will only return data from one. ISI's version switches on a reponame to finesse the dataset/redirection, so it knows cbm/object/40
should be queried as <http://data.crystalbridges.org/object/40>
--@VladimirAlexiev's configuration will work with it, but will still require URL rewriting/proxying.
My proposal would be to use ISI's pubby version with a similar configuration, but move URL rewriting and redirection up into the ISI instance. Without ISI's pubby, you would need 14 instances (pubby is that naive). Likewise if we can't avoid having some apache/ngnix instance doing routing, it may as well be owned and maintained in the hosting environment.
Adding a "Thar be dragons" to the apache config header would be optional.
(EFC)
I had some time to work up a spec configuration for Apache in https://github.com/ColbyMuseum/aac-url-rewrite. It's pretty lightweight, using an apache module for just this use case: a simple inbound hostname to proxy destination mapping from a text file.
Hostnames would have to be added there and in the Pubby instance's configuration after an instituion registers the DNS of their branded hostname, but otherwise deployment and custom configuration is minimal.
hi @cbutcosk great work!
In that repo you mention "Each instituion still needs a multiURIMapping
entry in the pubby configuration." Googled this and figured out it's ISI's addition to pubby:
conf:multiURIMapping is in conf:dataset and Pubby supports multiple datasets, so it's a matter of passing the full request URL to it. And I think this is what your work does
@caknoblock @cbutcosk @workergnome What's the status of this issue? Tested http://data.crystalbridges.org/object/108 and the URL (in the address bar) is still rewritten to non-branded. (This test value is from https://github.com/american-art/PUAM/issues/30)
When you download http://data.crystalbridges.org/exhibition/10 or http://data.americanartcollaborative.org/data/cbm/exhibition/10, you get this:
Paraphrasing, this says: there's a business entity at
data.crystalbridges.org
, which is described by a document atdata.americanartcollaborative.org
.Small problems:
a foaf:Document
to the docrdfs:label
, or remove it altogetherThe bigger problem is that the server redirects the business URL to the document URL. If you trace
curl -ILH accept:application/rdf+xml http://data.crystalbridges.org/exhibition/10
or the more visual traceback in #3, you'll see the redirects, finishing with a sort of loop at the document URL. So the server treats the two URLS as the same thing.It's also a Branding issue:
proxy_http
module (ProxyRequest ProxyPass ProxyPassReverse
) to fix this. Eg see https://github.com/AKSW/Sparqlify#configuration