dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
860 stars 269 forks source link

Linked Data enforces HTTPS but has redirects downgrading to HTTP #722

Open JJ-Author opened 2 years ago

JJ-Author commented 2 years ago

Where did the problem occur (e.g. dbpedia.org/sparql, lookup, spotlight)?

Please give the full URL, if possible

Linked Data interface of dbpedia.org ( http://dbpedia.org/resource/.. )

Problem description

when looking at the redirect chain I identified a problem. Linked Data enforces https but then has a fallback to http which does not make sense to me and can break downloading data. (I remember if you download files with vanilla Java e.g. from the databus/collections (whereas the databus file identifiers use https) you have a problem with files that point to non-https download locations (so original download url is not https. fix applied to solve this dbpedia/dbpedia-databus-collection-downloader@6091021)

http://dbpedia.org/resource/Berlin --[303]--> https://dbpedia.org/resource/Berlin --[303]--> http://dbpedia.org/data/Berlin.ttl -[303]-> https://dbpedia.org/data/Berlin.ttl

Expected behaviour

Fix option 1: https not enforced when rdf mimetypes are requested (for html mimetype https enforcing makes sense and this is probalby what broke it, a wrong lets-encrypt auto setting) http://dbpedia.org/resource/Berlin --[303]--> http://dbpedia.org/data/Berlin.ttl Fix option 2: https enforced http://dbpedia.org/resource/Berlin --[303]--> https://dbpedia.org/resource/Berlin --[303]--> https://dbpedia.org/data/Berlin.ttl

Request/Reproduction

Give the link or request, so the problem can be reproduced. Ideally, this would be a unix curl command.

danielbeeke commented 2 years ago

See for a screenshot of the problem:

https://forum.dbpedia.org/t/fetch-https-dbpedia-org-resource-ambrose-request-on-https-url-gets-redirected-to-an-http-address-and-is-blocked-by-the-browser/1588/9?u=danielbeeke

JJ-Author commented 2 years ago

also see this here

https://www.test-cors.org/#?client_method=GET&client_credentials=false&client_headers=Accept%3A%20application%2Fn-triples&server_url=http%3A%2F%2Fdbpedia.org%2Fresource%2FBerlin&server_enable=true&server_status=200&server_credentials=false&server_tabs=remote