Closed geerlingguy closed 2 years ago
Searching around a bit, I also found the issue on Drupal.org: Option to force URLs to return HTTPS instead of HTTP.
And indeed, I have in my Drupal configuration (in a local.settings.php
):
$settings['reverse_proxy'] = TRUE;
$settings['reverse_proxy_addresses'] = ['IP_OF_SERVER_HERE'];
But how does this work if I also want Drupal to detect all Cloudflare IP addresses as reverse_proxy_addresses
? And is that affected by Nginx still fronting the requests?
Edit: Also, a note from the metatag issues: https://www.drupal.org/project/metatag/issues/2842049#comment-14260772
lol, of course I've written my own blog post on the topic... Configuring CloudFlare with Drupal 8 to protect the Pi Dramble.
I just updated my Drupal config to use:
// Reverse proxy - Cloudflare.
$settings['reverse_proxy'] = TRUE;
$settings['reverse_proxy_addresses'] = array($_SERVER['REMOTE_ADDR']);
$settings['reverse_proxy_header'] = 'HTTP_CF_CONNECTING_IP';
We'll see if this fixes anything.
Hmm... though reverse_proxy_header
is deprecated / removed in Drupal 9. Grr.
Nick Craver mentions I could enable HSTS on my domain (see https://twitter.com/Nick_Craver/status/1501248041004716043 and the rest of that thread) and it might help resolve it too. Though it'd be nice to make sure all the settings are correct up and down the stack.
I might go with this sample code from this comment:
if (isset($_SERVER['HTTP_CF_CONNECTING_IP'])) {
// If the CloudFlare header is contained in the X-Forwarded-For header, then
// all IP addresses to the right of that entry are reverse-proxies, which are
// additional to the value in $_SERVER['REMOTE_ADDR].
// E.g. <client> --- <CDN> --- <Varnish> --- <drupal>.
$client = $_SERVER['HTTP_CF_CONNECTING_IP'];
$ips = explode(', ', $_SERVER['HTTP_X_FORWARDED_FOR']);
if ($keys = array_keys($ips, $client)) {
$position = end($keys);
$reverseProxies = array_slice($ips, $position + 1);
$reverseProxies[] = $_SERVER['REMOTE_ADDR'];
$settings['reverse_proxy'] = TRUE;
$settings['reverse_proxy_addresses'] = $reverseProxies;
}
}
Commit above has the code I added to live local.settings.php
.
Provider of the curl
output here, attempting to be helpful.
I'm guessing somehow Cloudflare is passing through http requests sometimes
Internally in Nginx, I have redirects from http to https though, so I'm also not sure how the http could ever get through to a rendered feed...
Given that a reader has seen a feed with plain http URLs in it, the redirect could be considered a good thing as it means attempts to read the full article page will succeed. Although the goal here is rightly not to publish those URLs to the feed in the first place.
As for how/why the feed has plain http URLs in it in the first place, I'm afraid I have no knowledge of Drupal, but in an attempt to rubber-duck towards a resolution:
(I'm not looking for answers here - I'm trying to find the right question to help you realise the cause)
As a curious observer, it seems there must be multiple Drupal instances at play, with varying senses of "self". How does a Drupal instance know it's identity? Does it get the scheme the same way? Are the nodes all definitely configured identically?
If that's what this is already trying to address:
I just updated my Drupal config to use:
// Reverse proxy - Cloudflare. $settings['reverse_proxy'] = TRUE; $settings['reverse_proxy_addresses'] = array($_SERVER['REMOTE_ADDR']); $settings['reverse_proxy_header'] = 'HTTP_CF_CONNECTING_IP';
We'll see if this fixes anything.
... I can report that your latest article ("Rate limiting requests per IP address in Nginx") appears twice in my feed.
Oh, and I've just noticed the significance of the wording of the issue title:
RSS feed showing duplicates if someone subscribed to http version
I have not subscribed to the http version. My feed reader is configured to fetch https://www.jeffgeerling.com/blog.xml
- it's always the same request from me, but the links in the returned content vary.
@ChrisLawther - Okay, in that case I've fixed the title. I'm going to give it a couple days and we'll see if the next post hits the same issue. I don't have the time to dive into some CF requests on the server itself right now but that'll be the next step, to see where Drupal's getting its http requests from.
I was wondering whether anything in the response headers might point towards a misconfigured Drupal instance. If I compare an http-returning response with an https-returning one, the only differences that aren't simply time related are:
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=YiRbRdmXT3HOZiy%2FEAe6znimuIFEYJP7OIfa4vdCucCM6wPOnuKCHURPmtwvkWvImGqB02v9vpOU3LEAQwQSGP8wcVm5nWN8kaR54cEK1MlquGcFeSc1P6dm5jeIfOqcteW5TW7EUA%3D%3D"}],"group":"cf-nel","max_age":604800}
v.s.
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=MMrO673IS6%2FHL2VJnchOCF4J2b1hsOyxpxAGn73Uv81w7qZ9wReORTUO%2BtNZUbbwhnOWG9uLgSXHzMMIq8GksbDDUYpmaPVsxVdXLHUbU%2BnSjYzOnmrB4QNCnjrzKkEqVrR3RHssmQ%3D%3D"}],"group":"cf-nel","max_age":604800}
And
cf-ray: 6e9dcdcd1b697747-LHR
v.s.
cf-ray: 6e9dce404e6b72e8-LHR
... but they may both be CloudFlare internal details and nothing to do with the actual feed generation.
cf-ray: 6e9dcdcd1b697747-LHR
v.s.cf-ray: 6e9dce404e6b72e8-LHR
CF-Ray header is like a request ID, LHR part means you are connecting to London data center (they are using nearest airport code) so in your case London Heathrow.
report-to header is for bot/spam protection and network logging(this is the part that applies to CF).
From https://developers.cloudflare.com/fundamentals/get-started/http-request-headers/
The CF-ray
header is a hashed value that encodes information about the data center and the visitor’s request.
And report-to reference https://support.cloudflare.com/hc/en-us/articles/360050691831-Understanding-Network-Error-Logging
Someone else reported the issue today, too:
This is from Miniflux.
Nick Craver mentions I could enable HSTS on my domain (see https://twitter.com/Nick_Craver/status/1501248041004716043 and the rest of that thread) and it might help resolve it too.
@geerlingguy I think most of the RSS consumers don't respect HSTS.
I saw in your RSS feed that the link tag is to http
variant and Drupal sets it here https://github.com/drupal/drupal/blob/515d10367bbe5cc158153a90e7960f92c2862745/core/modules/views/templates/views-view-rss.html.twig#L24 and that link
variable is being populated by Url::fromRoute('<front>')->setAbsolute()->toString()
here https://github.com/drupal/drupal/blob/515d10367bbe5cc158153a90e7960f92c2862745/core/modules/views/views.theme.inc#L888
Can you try to do search-replace in your DB with something like PHPmyadmin, replace http://www.jeffgeerling.com/
with https://www.jeffgeerling.com/
@PH4NTOMiki - The problem is Drupal generates URLs on the fly, and the protocol is determined by how Drupal sees the request come in—I'm pretty sure for some reason some requests from Cloudflare are being returned as non-https, for some reason or another. When I wasn't using Cloudflare I never had that issue, because I only had one proxy in front of Drupal (Nginx), and I could easily detect when the proxy was being used. For some reason some requests from Cloudflare seem to bypass the proxy logic I added a few comments earlier, and that's when I'm guessing Drupal's generating a non-https feed that's also getting cached by Cloudflare.
Do you have enabled HTTPS Always Use HTTPS in Cloudflare dashboard https://developers.cloudflare.com/ssl/edge-certificates/additional-options/always-use-https/#encrypt-all-visitor-traffic
@PH4NTOMiki - Yes.
Do you have fastcgi_param HTTPS on;
in nginx config?
@PH4NTOMiki - I didn't, though I just forced it to on
in /etc/nginx/fastcgi_params
, restarted Nginx, and cleared caches on Cloudflare...
21:25:42 ~
$ curl https://www.jeffgeerling.com/blog.xml 2>/dev/null | grep "guid isPermaLink" | head -1
<guid isPermaLink="false">3191 at https://www.jeffgeerling.com</guid>
21:25:43 ~
$ curl https://www.jeffgeerling.com/blog.xml\?asa 2>/dev/null | grep "guid isPermaLink" | head -1
<guid isPermaLink="false">3191 at https://www.jeffgeerling.com</guid>
We'll see if that fix holds!
Fingers crossed, I'll make a script to test some URLs, hopefully they all come up as https. Will report back
Fingers crossed, I'll make a script to test some URLs, hopefully they all come up as https. Will report back
I tested multiple routes and everyone came as https.
Sounds like I owe @PH4NTOMiki a beer!
I'll close this for now—if anyone sees the duplicates again, please let me know!
It looks like one more bit of fallout from the #141 DDoS attacks and mitigations is a broken-for-some-users RSS feed:
Basically, there are cases where the URL returned in the
guid
has an http, and others where it's https. I'm not exactly sure how this is happening through Cloudflare (it never happened before)—but I'm guessing somehow Cloudflare is passing through http requests sometimes (even though I have "Full (strict)" enabled), and those are getting cached with the wrong guid's.Internally in Nginx, I have redirects from http to https though, so I'm also not sure how the http could ever get through to a rendered feed...