backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 39 forks source link

Fixed: Cached anonymous page and "background fetch" cause site rendering issues for LiteSpeed server #5525

Closed argiepiano closed 2 years ago

argiepiano commented 2 years ago

Description of the bug

When using "background fetch" (see admin/config/development/performance) of stale anonymous cached pages, Backdrop (or the LiteSpeed server?) seems to send headers and content twice.

I've been able to reproduce in a LiteSpeed production server with different sites, including a clean Backdrop install. Have NOT been able to reproduce on a local Apache one.

This issue causes duplication of content and a lot of problems when rendering the page (see attached image), among them also Warning: session_set_save_handler(): Cannot change save handler when headers already sent in backdrop_session_initialize() (line 245 of /home/tenutowe/public_html/backdrop/core/includes/session.inc).

Steps To Reproduce

These are very specific steps. Use two browsers, or one browser window on "normal" mode and another on "incognito" . Again, this is an issue in LiteSpeed server as far as I know.

  1. In "browser A", log in to your site, navigate to admin/config/development/performance and be sure "Cache pages for anonymous users" and "Use background fetch for cached pages" are on
  2. Turn OFF "Compress cached pages." This will allow you to actually see the duplicated content (otherwise Backdrop sends the "doubled page" as a gzip, which is not displayed).
  3. Make the expiration of cache pages to 1 minute (so that you don't have to wait too long to reproduce the error 😄 )
  4. Save
  5. Navigate to admin/config/development/logging and select "All messages" in "Error messages to display"

On a different browser (browser B):

  1. Clear the memory cache of the browser. This is done to assure the browser actually sends a request, rather than rendering the page from memory. This is done in different ways in different browsers. In Chrome, open the Developer Tools, and then Control Click the refresh button - select Empty Cache and Hard Reload
  2. Navigate to the home page of your site
  3. Click Log in
  4. Log in as a user. DO NOT clear the Backdrop caches!
  5. Navigate to Home
  6. Clear the BROWSER memory cache one more time (see 1)
  7. Sit patiently for one minute to wait for the cached anonymous pages to go stale (I put a timer!)
  8. After one minute (or longer), click Log out

Upon logout you will be redirected to the Home page. This version of the Home page is retrieved from a stale entry in the cache_page table, which caches anonymous pages. You should see duplication of elements and the Warning I printed above.

You can verify that the response came from the Backdrop cache by opening the Developer Tools of the browser, going to "Network", selecting the document, and clicking "Header" to see the response headers received. They should indicate a cached page on the Response header x-backdrop-cache: HIT (if you see MISS, you are not seeing a cached page).

Actual behavior

See above

Expected behavior

The page should load without any issues.

Additional information

Screen Shot 2022-03-02 at 8 13 04 AM

ANOTHER SNAPSHOT, FROM A CLEAN INSTALL:

Screen Shot 2022-03-02 at 2 39 58 PM
argiepiano commented 2 years ago

A quick update: I'm not able to reproduce this issue in a clean install (despite what I said earlier).

So this may actually be an interaction with an installed module.

indigoxela commented 2 years ago

Is turning on the error display essential for the steps, or does the duplication happen regardless?

Also, a list of installed contrib modules might help to further investigate.

If that happens only on specific pages - which blocks does that layout contain?

yorkshire-pudding commented 2 years ago

I was able to reproduce on my test site (which is hosted on the same host, using LiteSpeed Web Server, as my live site) following the steps exactly. Here are installed and enabled modules:

Current | Antibot 1.x-1.2.0Up to date Current | Coffee 1.x-2.2.1Up to date Current | Contact Forms 1.x-1.3Up to date Current | Disable login errors 1.x-1.0.0Up to date Current | Entity Plus 1.x-1.0.15Up to date Current | Font Awesome 1.x-1.2.5Up to dateIncludes: Font Awesome, Font Awesome Field Current | IndexNow 1.x-1.0.2Up to date Current | Insert 1.x-1.0.1Up to date Current | Matomo Analytics 1.x-2.12.0Up to date Current | Menu attributes 1.x-1.0.0Up to date Current | Prepopulate 1.x-2.1.0Up to date Current | Responsive Favicons 1.x-1.1.1Up to date Current | Search API 1.x-1.0.05Up to dateIncludes: Search API, Search Views Current | Security Review 1.x-1.3.1Up to date Current | Tawk.to 1.x-1.0.1Up to date Current | User picture field 1.x-1.0.0-rc2Up to date Current | Views Bootstrap 1.x-1.2.3Up to date Current | Views Conditional 1.x-1.3.1Up to date Current | XML sitemap 1.x-1.0.6Up to date Includes: XML sitemap, XML sitemap custom, XML sitemap engines, XML sitemap menu, XML sitemap node, XML sitemap taxonomy Themes Current | Bootstrap Lite 1.x-1.4.2Up to date Layouts Current | Boxton Hero1.x-2.0.1Up to date Includes: Boxton Hero, Geary Hero, Harris Hero, Hero layout, Moscone Flipped Hero, Moscone Hero, Rolph Hero, Simmons Hero, Sutro Hero, Taylor Flipped Hero, Taylor Hero
olafgrabienski commented 2 years ago

Must be a better way of getting modules

The list on the page admin/reports/updates is much shorter. No Core modules, and it doesn't show every sub module.

argiepiano commented 2 years ago

@yorkshire-pudding thanks so much for testing. In a way, it's a good thing that you are able to reproduce - it tells me it's not just my site or my hosting company.

@indigoxela: since the bug is throwing an error, I wanted to see if anyone testing would get the same errors. Turning on the error display doesn't make a difference in the bug. But if you turn off error display, you have to go to the watchdog log to see the errors.

In a nutshell, this is what I THINK is happening: when an anonymous user visits a "stale" cached page (in the cache_page table), and when background fetch is on, the server sends both the stale page AND the newly generated page in the same request. Supposedly, (I believe) background fetching should be sending only the stale page right away, and in the background, cache a new version of the page. This was done to speed up responses when there are long processes involved in generating a page (e.g. when the site runs Cron before the response).

This Backdrop addition was created a few years ago. Apparently it works fine in Apache. This may be a LiteSpeed bug, but I don't know. More time and research is needed... which I don't have right now. So, for now, I have opted for turning background fetching off.

[EDIT] This is the list of modules in my site: Address Field 1.x-1.3.1 Backup and Migrate 1.x-1.0.20 Configurable Block Style 1.x-1.0.6 Content Access 1.x-1.2.1 Devel 1.x-1.8.1 Entity Plus 1.x-1.0.15 Entity Reference 1.x-1.7.0 Entity Tokens 1.x-2.0.0-rc1 Feeds 1.x-1.1.2 FlexSlider 1.x-2.0.1 Font Awesome 1.x-1.2.5 HMS Field 1.x-1.0.0 HTML5 upload 1.x-1.1.2 Job Scheduler 1.x-1.1.4 Mail System 1.x-3.0.6 Mime Mail 1.x-1.1.4 PhotoSwipe 1.x-2.1.3 Profile 1.x-1.1.0 SMTP Authentication Support 1.x-1.3.4 Special Menu Items 1.x-2.1.2 Telephone 1.x-1.0.1 Views Aggregator Plus 1.x-1.0.4 Views Data Export 1.x-3.2.0 Webform 1.x-4.24.2 WebForm 2 PDF 1.x-4.1.0 Webform Calculator 1.x-2.1.0 Webform Views Select 1.x-1.0.1

argiepiano commented 2 years ago

@indigoxela , it turns out you are right: turning error display IS actually essential to reproduce the error! If you disable all error/warning message display, the error doesn't happen.

So, it looks like the issue happens because PHP throws a warning, and that somehow triggers both the cached page and the new fetched page to be sent together. Those warnings are never logged into watchdog - so, you never know that they happened.

I have also now been able to reproduce this behavior in a clean install. (Yesterday I said I hadn't - but I had error message display off, which is why they did not happen).

It'd be interesting to see if this is also the case with Apache servers. Anyone care to test? What you'd need to do:

indigoxela commented 2 years ago

I'm not able to reproduce with Apache. I think, I followed the steps exactly:

cache-settings

And error display turned on, of course.

Very odd.

klonos commented 2 years ago

It'd be interesting to see if this is also the case with Apache servers. Anyone care to test?

@argiepiano this is easy to test with our demo sandboxes, since they run on Apache (v2.4.38 currently). Here's what I did:

  1. Spun up a demo via https://backdropcms.org/demo
  2. Navigated to /admin/config/development/performance and:
    • made sure "Cache pages for anonymous users" is on
    • disabled "Compress cached pages"
  3. Navigated to /admin/config/development/logging, and set all messages to be displayed.
  4. Opened the home page in an incognito window
  5. Waited 15min, then refreshed
  6. Cleared caches, waited another 15min, refreshed again.

No errors, nothing logged in the dblog. So this seems to be limited to the LightSpeed server it seems.

klonos commented 2 years ago

Since we seem to have pretty much narrowed this issue down to a specific web server issue, would a temporary solution be to disable background fetch if LiteSpeed server is detected? (with a @todo added to further investigate this in the future)

FTR, the background fetch feature serves the purpose of addressing a very specific, low-impact annoyance, which is a slow page load in low-traffic sites when the site has not been visited by anyone in a while. This annoying thing only happens to the first site visitor that comes to the site after a while, and IMO it is not worth serving "broken pages" on account of that.

yorkshire-pudding commented 2 years ago

The problem only manifests with that enabled and errors set to display. As most production sites should have errors disabled from display, please can we only have background fetch disabled if:

webServer = LiteSpeed && errorMessageDisplay != 'None'
klonos commented 2 years ago

@yorkshire-pudding yes 👍🏼 ...I meant to introduce a backdrop_is_litespeed() function to be used in #5524 and elsewhere anyway 😉

indigoxela commented 2 years ago

I still wonder, what makes LiteSpeed choke on messages + background fetch.

I don't see an obvious reason for that.

yorkshire-pudding commented 2 years ago

I agree - my host said it should be a drop-in replacement, so there must be a very subtle difference.

argiepiano commented 2 years ago

I still wonder, what makes LiteSpeed choke on messages + background fetch.

I don't see an obvious reason for that.

I spent a bit of time on Saturday trying to figure that out. It's very difficult to debug, unless you have LiteSpeed installed in your local. For example, watchdog doesn't have any effect at that point (probably not bootstrapped yet to that level). Same thing with dpm.

I did find out a couple of things in the duplication: the first elements are sent from cache (I manually modified it to test), while the duplication is generated on the fly. It seems like the error we saw in my original post always happens in LiteSpeed, but it affects the response only when you have the error display on.

All of what follow is something I can't confirm or verify (since this is so hard to debug), but it looks like, when the error happens and the error display is on, Backdrop decides to generate the page and send it again, including headers (despite the fact that it had already sent it from cache). This may actually be "expected" behavior when you have an error (sending a newly generated page instead of using the cache) - but I haven't verified that. So, there may be a bug after all, except that in Apache that error is never generated, and this situation is never encountered.

Again all of this is speculation and it may not be accurate. One thing that may help is to find out a way to debug without outputting stuff to the browser. I thought of writing a little function that saved stuff to a file instead, but I have found myself very short on time with a few deadlines these days. Any thoughts on this?

argiepiano commented 2 years ago

Another piece of weird info: the errors that you see in my screenshot in my original post: they are themed by Backdrop, but they are never logged into the dblog/watchdog

quicksketch commented 2 years ago

The main location of Backdrop's background fetch cache code is in _backdrop_bootstrap_page_cache(). The central concept is that when a cached page expires, Backdrop serves one stale copy back to the browser, but keeps the PHP process open to generate a fresh cache for the next request.

This section of code in particular is responsible for closing the connection between the web server and PHP. Which is intended to let PHP continue processing while the web server returns the cached page to the browser:


      // Close the connection immediately if we are about to serve an expired
      // cache entry. This allows for background processing without holding up
      // the browser.
      if ($cache_expired) {
        header('Connection: close');
      }

      // End the request and send the response to the browser. This call has
      // error suppression on it as some PHP versions may require it (PHP 5.3)
      // but other PHP versions throw a warning (PHP 5.5).
      @ob_end_flush();

      // Flushing for PHP-FPM based installations.
      if (function_exists('fastcgi_finish_request')) {
        fastcgi_finish_request();
      }
      // Flushing for Apache mod_php installations.
      else {
        flush();
      }

I'm guessing what's happening here is that LiteSpeed needs a specialized way of having PHP return HTML and close the connection to the web server. If the connection fails to be closed, you would end up with the same page rendered twice (one the old cached copy, and a second fresh copy). I did some searching and it looks like Symfony ran into the same problem recently: https://github.com/symfony/symfony/issues/42293. There apparently is a very poorly documented function litespeed_finish_request() that serves the same purpose for LiteSpeed servers.

Separately, I'm not sure this really counts as a bug report, since we only officially support Apache and nginx (see Requirements). If we add this (add the corresponding backdrop_is_litespeed() function), we may be indicating official support.

quicksketch commented 2 years ago

I don't have a LiteSpeed server set up to test, but I think this PR should solve the fundamental problem: https://github.com/backdrop/backdrop/pull/3981.

argiepiano commented 2 years ago

I've reviewed and tested on my LiteSpeed server, and this patch solves the issue. LGTM 👍🏽 Thanks, @quicksketch!

I'm not sure how to mark this issue if it's not considered a bug. Feature request? Task?

indigoxela commented 2 years ago

I'm not sure this really counts as a bug report, since we only officially support Apache and nginx ... we may be indicating official support.

Fair enough. Then it's (sort of) a new feature. I think, we should support LiteSpeed, as its usage seems to increase.

yorkshire-pudding commented 2 years ago

I've tested on my LiteSpeed Server and it fixes the issue. I agree that we should support LiteSpeed going forward; my host, Kualo, has moved to it and more and more are using it. If we want to badge Backdrop as being suitable for shared hosting, then we should probably support this as more shared hosts implement it. There is an open source version (Open LiteSpeed).

quicksketch commented 2 years ago

Thanks @argiepiano and @yorkshire-pudding! Could either (or both) of you take a look at the other LiteSpeed issues we have open?

I merged https://github.com/backdrop/backdrop/pull/3981 into 1.x and 1.21.x. "Bug" or "Feature", I don't think there's any harm in getting this into the next bugfix release.

yorkshire-pudding commented 2 years ago

Thanks @quicksketch - I've looked at all those: I've tested #5524 and #5150 and both work. I've added comments to #3584 and #3386 to confirm that I do not experience these issues on LiteSpeed. I think they are/were limited just to OpenLiteSpeed. It may be that these are now sorted, but I'm not sure how to test that as setting up a test server outside lando is a bit beyond me.

quicksketch commented 2 years ago

Thank you @yorkshire-pudding! I didn't realize there would be a difference between LiteSpeed and OpenLiteSpeed. I appreciate your checking on each of them.