Open Flameeyes opened 4 years ago
I knew it was only a matter of time before this issue is opened when I saw the post linked in some Reddit drama :smiley:
It would be nice if we could just implement HTTP and RSS specifications properly. Unfortunately, many sites do not follow the HTTP specification correctly so we cannot trust that permanent redirect is actually meant to be permanent. This puts us in a weird place where we either get issues reported by owners of compliant sites or by users of uncompliant ones.
Another issue is that changing user entered data by a machine without user’s knowing is somewhat icky. If we allow that, a bug in selfoss or the site, or even a malicious actor in the middle can change it to an incorrect value permanently. (Notice the article you link even suggest checking the link over a month before changing it, which already deviates from the spec.)
For those reasons, for a long time, I had on my to-do list, implementing a UI that would inform user of encountered permanent redirects on the settings page of selfoss.
Thinking about it more, we could even change it automatically if the permanent redirect remains over a period of time as the article suggests, as long we include an interactive item in the feed that informs user about the change and allows to easily revert it if necessary.
Possibly relevant: https://github.com/simplepie/simplepie/pull/660
Edit: We do not actually use that file since we have a custom implementation.
The upstream File
class sets url
property to the effective URL and resets the permanent_url
property to the original URL if 301 is not the first redirect, does not support chains of permanent redirects, and is not aware 308 is permanent redirect too.
For selfoss, patch like the following should work:
diff --git a/src/helpers/SimplePieFileGuzzle.php b/src/helpers/SimplePieFileGuzzle.php
index 98b618b8..8e73b685 100644
--- a/src/helpers/SimplePieFileGuzzle.php
+++ b/src/helpers/SimplePieFileGuzzle.php
@@ -53,6 +53,7 @@ class SimplePieFileGuzzle extends \SimplePie_File {
});
$this->url = WebClient::getEffectiveUrl($url, $response);
+ $this->permanent_url = WebClient::getPermanentUrl($url, $response);
$this->body = (string) $response->getBody();
$this->status_code = $response->getStatusCode();
} catch (\GuzzleHttp\Exception\RequestException $e) {
diff --git a/src/helpers/WebClient.php b/src/helpers/WebClient.php
index 69e0b2b2..e6b5818a 100644
--- a/src/helpers/WebClient.php
+++ b/src/helpers/WebClient.php
@@ -127,4 +127,29 @@ class WebClient {
return $urlStack[count($urlStack) - 1];
}
+
+ /**
+ * Get permanent URL of the response.
+ * RedirectMiddleware will need to be enabled for this to work.
+ *
+ * @param string $url requested URL, to use as a fallback
+ * @param GuzzleHttp\Psr7\Response $response response to inspect
+ *
+ * @return string URL we should change the future requests to
+ */
+ public static function getPermanentUrl($url, GuzzleHttp\Psr7\Response $response) {
+ // Sequence of fetched URLs
+ $urlStack = array_merge([$url], $response->getHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER));
+ // Let’s consider the first URL a permanent redirect since it is our start point.
+ $statusStack = array_merge([301], $response->getHeader(\GuzzleHttp\RedirectMiddleware::STATUS_HISTORY_HEADER));
+
+ // Follow the chain util first non-permanent redirect, since non-permanent
+ // redirect could invalidate permanent ones later in the chain.
+ $index = 0;
+ while (($statusStack[$i] === 301 || $statusStack[$i] === 308) && $index < count($urlStack)) {
+ $index++;
+ }
+
+ return $urlStack[$index - 1];
+ }
}
Then “only” the UI part will remain (https://github.com/fossar/selfoss/issues/1120).
What if someone does the blackhat SEO trick of buying expired domains in order to accumulate the high 4+ PageRank from the domain name? Or if a hacker temporarily hijacks the domain name or if they add a phishing website inside an already existing website? What if it was done to a free remotely hosted website like Wordpress.com, wix, weekly or squarespace?
Then the permenant redirect will take the Selfoss user to an unintended website, if the domain was to get put into its initial use, later on.
Also, does Selfoss explicitly notify the user that a redirect has occurred? Where is the visual indicator for this?
Yes that’s why I am leaning towards requiring explicit confirmation, see https://github.com/fossar/selfoss/issues/1178#issuecomment-613455621.
Context: https://www.ctrl.blog/entry/permanent-redirects.html
It seems like there's a few Selfoss users fetching one of my blog's old feed addresses. And despite this responding with 308 Permanent Redirect for months, it seems like they don't learn the new address.