Open namedgraph opened 3 years ago
Hi @namedgraph (nice user name by the way) can you please elaborate, why you think that DBpedia Linked Data interface is broken? I consider the HTTPS URL of a resource just as a special "generic document" that describes the non-information URI (NIR) aka "resource ID". See the image below from cool URIs. In other words our resource IDs are non-HTTPS. HTTPS is just used as (mandatory - this might be discussed) security layer.
Linked Data is about self-describing resources.
If http://dbpedia.org/resource/Copenhagen
is requested, RDF data with http://dbpedia.org/resource/Copenhagen
in the subject position (and possibly additional resource descriptions) should be returned.
If https://dbpedia.org/resource/Copenhagen
is requested, RDF data about https://dbpedia.org/resource/Copenhagen
should be returned.
http://dbpedia.org/resource/Copenhagen
and https://dbpedia.org/resource/Copenhagen
are two distinct resources in RDF since their URIs differ.
As my examples show, when http://
is requested, the server redirects to https://
but then returns data about http://
anyway.
When https://
is requested, the data is still about http://
.
See the email thread for more details.
@namedgraph there seems to be still a lot of confusion here.
From an RDF perspective https://dbpedia.org/resource/Berlin
does not exist as a resource. It is only the URL of the generic document that delivers the description ( of http://dbpedia.org/resource/Berlin
). We don't use https based RDF resource identifiers because of the simple reason you mentioned (string identity in RDF) -- so far. So again http://dbpedia.org/resource/
is the RDF namespace and https://dbpedia.org/resource/
is no RDF namespace (and these https URIs should never occur in any kind of RDF data, and therefore should be never looked up by any linked client directly!) To be more clear lets have a look again at the Alice example from above which translates to the following.
http://dbpedia.org/resource/Berlin ~ http://www.example.com/id/alice https://dbpedia.org/resource/Berlin ~ http://www.example.com/doc/alice https://dbpedia.org/data/Berlin.ttl ~ http://www.example.com/doc/alice.rdf
I see that this might be not so clear on the very first look since both namespaces look very similiar and not so explicitly different as in the cool uris example.
Based on your email conversation and this github issue I understood the following problems / request. But in the end we need you to show what actual problems do you have. So which particular client breaks and why.
http://dbpedia.org/resource/Berlin owl:sameAs https://dbpedia.org/resource/Berlin
for all resources. My question here is why does this help? For me this would only lead to other problems. You always need linked data clients and tools that support inference and I am afraid that people start to use 2 different identifiers for the same thing which definitely makes a lot of things more tricky and can break stuff (just imagine you would need to change also the class identifiers of the dbpedia ontology to https, then owl:sameAs wont help you would need owl:equivalentClass or materialize all type statemens with https and without https. And what happens to datatypes? C: But when looking at the redirect chain I think I identified an actual problem. Fallback to http which does not make sense to me (?) @pkleef @kurzum maybe this is what actually break clients (I remember if you download files with native java from the databus/collections with the databus file identifiers which use https, you can have a problem with redirects that point to non-https download locations (so download url is not https) https://github.com/dbpedia/dbpedia-databus-collection-downloader/commit/609102199ab4ebc3217ae05a71a08a3d8fd267e1) ~~
see https://github.com/dbpedia/extraction-framework/issues/722http://dbpedia.org/resource/Berlin --[303]--> https://dbpedia.org/resource/Berlin --[303]--> http://dbpedia.org/data/Berlin.ttl -[303]-> https://dbpedia.org/data/Berlin.ttl
Fix option 1: https not enforced
http://dbpedia.org/resource/Berlin --[303]--> http://dbpedia.org/data/Berlin.ttl
Fix option 2: https enforced
http://dbpedia.org/resource/Berlin --[303]--> https://dbpedia.org/resource/Berlin --[303]--> https://dbpedia.org/data/Berlin.ttl
So essentially DBPedia's http://
identifiers are canonical, and https://
should not be used and only occur behind the scenes during the redirects?
We also have encountered variations on this issue.
Browsers increasingly look deep into a web transaction.
If the browser detects an http://
resource it might get flagged (or blocked).
This was true when using SPARQLer (recently upgraded to https://
).
However, we've seen instances of http://
endpoints in SPARQL queries fail when fetched using http://
I think the easiest way to encounter this issue is just to grab the URL from the browser's address bar, which after the redirects is the https://
URL, and then use it somewhere else, like in a Linked Data browser.
You can rationalize that "this is not the canonical URL", but people just expect it to work.
I agree the Linked Data and Semantic Web practices and standards are quite old, not easy to understand and not always super user friendly. IMO it was not designed to be consumed by humans and use cases like your copy and paste browser usage. DBpedia exists since 2007 and the feature you request has a lot of pitfalls and can break a lot of things or make the identifiers even more confusing or just wrong in the future (if you copy it from the browser you get the ID of the html page, not of the entity, sorry but that is just a semantic difference that is in place for a very long time, not DAU friendly though I totally see that). If a project starts from scratch now it can just go with HTTPS-only identifiers and then all this trouble is not an issue
I tried it with Wikidata and what you request also seems not to work there neither via SPARQL nor via Linked Data Also Github has a separate "raw" namespace to download files and separates between files content and html presenation of the file.
To move forward, I spitted the issue into the "actual" bug I discovered (https://github.com/dbpedia/extraction-framework/issues/722) and your feature request.
I wouldn't blame the Semantic Web for this, as RDF doesn't really care about http://
or https://
:)
I would attribute this to legacy conventions/technical debt. As you mentioned the issue would be solved by making https://
canonical.
@JJ-Author another problem with http://
as canonical URIs is that they cannot be requested from a secure page.
Issue validity
Live data on dbpedia.org.
Error Description
There is a
http://
/https://
mismatch between requested URIs and the URIs in the data.Details
Originally reported here: https://sourceforge.net/p/dbpedia/mailman/message/37362683/
The server forces
https://
URLs:But the returned RDF data contains
http://
URIs:Another example, this time requesting
https://
: