bednee / cooluri

GIT repository for TYPO3 extension CoolUri
7 stars 12 forks source link

fixes #66017 #55

Closed jpmschuler closed 7 years ago

jpmschuler commented 8 years ago

https://forge.typo3.org/issues/66017

Implemented fix that uses https://tools.ietf.org/html/rfc3986#section-4.2 compliant URIs like "//www.google.com", thus omitting protocol instead of trying to detect it. So in all cases where http was already used it is still used, in all scenarios where https was used, this will stick to https. No PHP code necessary, all done on Client Side.

h4de5 commented 7 years ago

it seems some clients can not yet handle those RFC complaint URIs. E.g. Solr indexing will not be able to follow redirects from cooluri accordingly and will interpret them as relative urls.

Solr Indexer requests: http://domain.com/?id=1234 => cooluri send: Location //domain.com/foo/bar => Solr Indexer subsequently calls: http://domain.com/domain.com/foo/bar => cooluri delivers 404 page => solr indexes 404 page ..

see: https://forum.typo3.org/index.php?t=msg&th=215837&#msg_749437

jpmschuler commented 7 years ago

The URIs of type "//whatever" without protocol and colon was already part of RFC 1808 from 1995 - I don't think "not yet" covers any relevant client.

If your Solr isn't able to follow the links accordingly, probably it's merely a Solr configuration error and not a missing feature.

h4de5 commented 7 years ago

thanks for the hint. so far this is all I got from going through the apache access logs and the solr log files. At the moment I can not rule out a misconfiguration, but when I disable cooluri, indexing works fine (tho' it will show "uncool" URLs in the search result.)

jpmschuler commented 7 years ago

Of course, makes sense. I don't doubt that. Your header analysis is correct IMHO. And this architecture change in CoolURI will trigger exactly this relocation behavior, which TYPO3 core doesn't need.

I just doubt that Solr isn't capable of this. (And that there are other clients, which are not capable).

h4de5 commented 7 years ago

just a brief follow up: I am talking about the Solr typo3 extension (https://github.com/TYPO3-Solr/ext-solr). This extension uses file_get_contents + http headers to load pages and at least my tests this functions seems not to be capable of handling this protocol omits correctly.

jpmschuler commented 7 years ago

hm.. file_get_contents indeed has problems (cUrl doesn't btw - one could use curl to resolve the url or even use cUrl to get the content).

Nevertheless, I investigated a bit and found following:

So in fact you were right, that it is "quite new standard" - although it's not the URI scheme itself, but the HTTP Location header. Sorry for the misunderstanding. Didn't know the necessary change in RFC 7231 was so new beforehand.

You should perhaps open an issue for Solr to use curl, PHP to fix file_get_contents and CoolURI to fix this issue here differently.

The function CoolURI needs would need to take target page pages.urlscheme into account, and fall back to current URLs $ SERVER['HTTPS'] if url_scheme is 0. Reintroducing "http://" would be a major problem, as stated in the original issue. I don't have the time currently for a pull request fixing this.

h4de5 commented 7 years ago

seems like something almost 3 years ago and the term "new" can only mix well in rfc context.. ;-)

I have patched/xclassed our solr extension, using a curl request - which does work fine so far. Also I am not expecting you to change anything in cooluri. I am quite happy with this fix as it was solving a problem we had behind a reverse proxy/loadbalancer that was terminating all https requests, leaving $_ SERVER['HTTPS'] empty.

I am just trying to help the next guy, who wants to figure out, why solr is not indexing correctly.

Thanks for your further investigations!