american-art / semantic-hosting

Issues about http://data.americanartcollaborative.org/
1 stars 0 forks source link

emit triples only against Branded URLs #4

Open VladimirAlexiev opened 7 years ago

VladimirAlexiev commented 7 years ago

When you download http://data.crystalbridges.org/exhibition/10 or http://data.americanartcollaborative.org/data/cbm/exhibition/10, you get this:

<http://data.crystalbridges.org/exhibition/10> rdf:type crm:E5_Event ;
      crm:P1_is_identified_by <http://data.crystalbridges.org/exhibition/10/appellation> .

<http://data.americanartcollaborative.org/data/cbm/exhibition/10>
      rdfs:label "RDF description of " ;
      foaf:primaryTopic <http://data.crystalbridges.org/exhibition/10> .

Paraphrasing, this says: there's a business entity at data.crystalbridges.org, which is described by a document at data.americanartcollaborative.org.

Small problems:

The bigger problem is that the server redirects the business URL to the document URL. If you trace curl -ILH accept:application/rdf+xml http://data.crystalbridges.org/exhibition/10 or the more visual traceback in #3, you'll see the redirects, finishing with a sort of loop at the document URL. So the server treats the two URLS as the same thing.

It's also a Branding issue:

VladimirAlexiev commented 7 years ago

Looking at other entities (http://data.americanartcollaborative.org/page/cbm/object/197, http://data.americanartcollaborative.org/page/cbm/object/197/group_title, even http://data.americanartcollaborative.org/page/cbm/exhibition/11), I don't see a doc with foaf:primaryTopic. So I guess there's some leftover triples in http://data.crystalbridges.org/exhibition/10, leftovers from an abandoned design.

Nevertheless, the Branding issue remains

caknoblock commented 7 years ago

We are not doing anything with the exhibitions or bibliography data right now. We need to clean up the old models/triples.

On Feb 22, 2017, at 5:57 AM, Vladimir Alexiev notifications@github.com wrote:

Looking at other entities (http://data.americanartcollaborative.org/page/cbm/object/197 http://data.americanartcollaborative.org/page/cbm/object/197, http://data.americanartcollaborative.org/page/cbm/object/197/group_title http://data.americanartcollaborative.org/page/cbm/object/197/group_title, even http://data.americanartcollaborative.org/page/cbm/exhibition/11 http://data.americanartcollaborative.org/page/cbm/exhibition/11), I don't see a doc with foaf:primaryTopic. So I guess there's some leftover triples in http://data.crystalbridges.org/exhibition/10 http://data.crystalbridges.org/exhibition/10, leftovers from an abandoned design.

Nevertheless, the Branding issue remains

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/american-art/semantic-hosting/issues/4#issuecomment-281675950, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB-qcLysToXoRU4KrvHnrUJy6eB3XXHks5rfD7MgaJpZM4MIqMA.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/american-art/semantic-hosting","title":"american-art/semantic-hosting","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/american-art/semantic-hosting"}},"updates":{"snippets":[{"icon":"PERSON","message":"@VladimirAlexiev in #4: Looking at other entities (http://data.americanartcollaborative.org/page/cbm/object/197, http://data.americanartcollaborative.org/page/cbm/object/197/group_title, even http://data.americanartcollaborative.org/page/cbm/exhibition/11), I don't see a doc with foaf:primaryTopic. \r\nSo I guess there's some leftover triples in http://data.crystalbridges.org/exhibition/10, leftovers from an abandoned design.\r\n\r\nNevertheless, the Branding issue remains"}],"action":{"name":"View Issue","url":"https://github.com/american-art/semantic-hosting/issues/4#issuecomment-281675950"}}}

VladimirAlexiev commented 7 years ago

What do exhibitions have to do with this issue?

VladimirAlexiev commented 7 years ago

The semantic data is recorded sometimes against straight (branded URLs), eg

and other times against AAC-ified URLs, eg

This is extremely confusing and makes the task of validation through http://review.americanartcollaborative.org very hard. @workergnome, how do you deal with this?

Permanent URLs should be well-designed and follow the same policy.

VladimirAlexiev commented 7 years ago

The last comment is related to but not the same as https://github.com/american-art/PUAM/issues/30

workergnome commented 7 years ago

I've been treating the URLs as opaque, and starting from a select ?id where {?id a crm:E22_Man-Made_Object} to get my initial list.

I didn't define URLs, both since I figured that those patterns should be defined by the museums and I'm not sure what the limitations of Karma are. Again, I am ambivalent—I believe they URLs should be opaque, so I've been treating them like that.

(which is not to say that they're meaningless—I agree completely that there are problems with the URLs chosen, and entities that are the same should share URLs.)

VladimirAlexiev commented 7 years ago

URLs should be opaque in SPARQL, no doubt about (eg no slicing of URLs should ever be needed).

But by not defining them, you've allowed students to make bad mistakes

entities that are the same should share URLs

  • Right: having separate title type for each instance of first name is crazy.

And also: entities that are different must have different URLs.

workergnome commented 7 years ago

See my comments in https://github.com/american-art/aac_mappings/issues/48: I agree these are problems, but I'm not sure I was (or am) the right person to specify URL patterns.

VladimirAlexiev commented 7 years ago

@caknoblock @workergnome About the "DNS/redirect" issue that's so heavily discussed right now:

VladimirAlexiev commented 7 years ago
VladimirAlexiev commented 7 years ago

Currently http://data.crystalbridges.org/object/108 redirects to http://data.americanartcollaborative.org/cbm/object/108 (you can see this with curl -Iv http://data.crystalbridges.org/object/108). This makes it diffucult for museums to deploy since they need to mess with a web server.

My basic idea is as follows. I'm not even sure that playing with Apache proxy will be needed:

cbutcosk commented 7 years ago

That specific configuration won't work out of the box because Pubby does not consider the hostname when it constructs the request URI AFAICT, so if object/40 is present in both datasets it will only return data from one. ISI's version switches on a reponame to finesse the dataset/redirection, so it knows cbm/object/40 should be queried as <http://data.crystalbridges.org/object/40>--@VladimirAlexiev's configuration will work with it, but will still require URL rewriting/proxying.

My proposal would be to use ISI's pubby version with a similar configuration, but move URL rewriting and redirection up into the ISI instance. Without ISI's pubby, you would need 14 instances (pubby is that naive). Likewise if we can't avoid having some apache/ngnix instance doing routing, it may as well be owned and maintained in the hosting environment.

Adding a "Thar be dragons" to the apache config header would be optional.

(EFC)

cbutcosk commented 7 years ago

I had some time to work up a spec configuration for Apache in https://github.com/ColbyMuseum/aac-url-rewrite. It's pretty lightweight, using an apache module for just this use case: a simple inbound hostname to proxy destination mapping from a text file.

Hostnames would have to be added there and in the Pubby instance's configuration after an instituion registers the DNS of their branded hostname, but otherwise deployment and custom configuration is minimal.

VladimirAlexiev commented 7 years ago

hi @cbutcosk great work! In that repo you mention "Each instituion still needs a multiURIMapping entry in the pubby configuration." Googled this and figured out it's ISI's addition to pubby:

conf:multiURIMapping is in conf:dataset and Pubby supports multiple datasets, so it's a matter of passing the full request URL to it. And I think this is what your work does

VladimirAlexiev commented 7 years ago

@caknoblock @cbutcosk @workergnome What's the status of this issue? Tested http://data.crystalbridges.org/object/108 and the URL (in the address bar) is still rewritten to non-branded. (This test value is from https://github.com/american-art/PUAM/issues/30)