proposal: Engine output caching

Toxicable commented 7 years ago

I'm submitting a ...
[x] feature request
What modules are related to this Issue? All engines
What is the current behavior? There is not currently a way to cache the output of an Engine rendering templates
What is the expected behaviour? There should be an API that allows for output caching. This API can be shared between all Engines since the logic is not dependant on what engine.

Introducing this API can provide a performance benefit to devs using any of the Engines this is to be implemented by not having to re-render output.

Proposed API

Add additional option to Engine config:

outputCaching: OutputCachingConfig

interface OutputCachingConfig {
  pathsOrPremade: ('all' | 'allIgnoreParamRoutes') | string[];
  expirationDuration?: number;
  maxCacheSize?: number 
}

pathsOrPremade

'all' - All output will be cached, keyed on the full url
string[] - An array of paths/matchers to be cached, keyed on provided path and values being paths matched

expirationDuration?

How long in minutes that all output will be cached for on an individual basis
If not set then all values will be cached indefinitely

maxCacheSize?

Number of items that can be stored in the cache
Once exceeded, the oldest items will be removed
If unset then the cache will continue to grow without bounds which may result in using excessive memory usage, depending on the App architecture

Example usage with Express Engine

ngExpressEngine({
  bootstrap: ServerAppModule,
  outputCaching: {
    pathsOrPremade: [ '', 'login', 'home' ],
    expirationDuration: 60 // 1hr
    maxCacheSize: 10
  }
}));

Persistance

While the the initial version would be intended to work with in memory caching, we could easily make a pluggable storage backend for the cache which would have file system or DB implimentations

m98 commented 7 years ago

Thanks :+1: It could be even more helpful if we could set the expiration time specifically for each route. for example, maybe homepage should expire after 10 minutes, but a news content page can expire after a week.

Toxicable commented 7 years ago

@m98 That might be a bit tedious to set the expiry time for every route, we'll see what people would prefer using and go from there, for now a blanket timeout should suffice

doggy8088 commented 7 years ago

We have a site that run webpack build will cost 50 seconds to finish. We implemented SSR using Angular Universal. We load test on it. It just can't pass 100 concurrent users request to server. The CPU will stay on the top line (100%) until timeout. We really need Engine output caching feature. 👍

m98 commented 7 years ago

@doggy8088, The first time I started the project on the server, the CPU stayed on the top (around 90% to 100%). But we decided to use Varnish cache to cache the server-rendered HTML for a defined time, and it works fine for another project we had. Hope that helps you until there is a feature on Universal itself for output caching.

doggy8088 commented 7 years ago

@m98 How you cache all the pages? Are you run a spider that fetch all the urls in your website?

nicky-lenaers commented 7 years ago

Sounds great! Should there be an accompanying feature for busting the cache as well?

Toxicable commented 7 years ago

@nicky-lenaers not sure what you mean, this is a purely server side cache, so any invalidation will generally be done a on a timer. But if you have a situation where this will not do then I'd like to hear it

nicky-lenaers commented 7 years ago

@Toxicable I see. Correct me if i'm dead wrong here, but I can imagine a dynamic page (content driven by a CMS) got cached with an expiration of 1 week on its route. Before the week is over, someone updates some content, but the cache is not expired yet at that point, so users will hit the cached page instead of the updated page. So wouldn't it be nice to have an api to delete/update/refresh cached resources by e.g. a cache-id on demand?

While the the initial version would be intended to work with in memory caching, we could easily make a pluggable storage backend for the cache which would have file system or DB implimentations

I'm not sure if this example applies to that, but I have had similar issues (as the one I just described) in the past with cached resources on disk. Just sharing thoughts here 😃

Toxicable commented 7 years ago

My initial thought on pages as dynamic as you mention would be to just exclude them from the cache, but on second thought there should still be a way to invalidate entries. I think providing an option where the server can provide a function which determines cache key validity might work. Some like so

function isCacheValid(cacheKey, cacheValue): boolean

That's just my first thought on the matter, feel free to suggest alternatives

nicky-lenaers commented 7 years ago

Right. For future ref I'll speak of invalidating as that is the proper term I believe. Anyway, just for inspirational thoughts on such api, consider taking a look at what has been made for ngx-cache, especially this example. They seem to be using a Map interface as kind of a store of cache refs for determining if values are cached. Please share your thoughts on such implementation.

theomathieubhvr commented 7 years ago

FYI this is how we did it on our website. It works well with 1500 concurrent users all the time (we have 16 nodes).

server.get('*', (req, res) => {

    if (req.cookies.jsonwebtoken) {
      return res.render('../public/index.html', {
        req,
        res
      });
    }

    const redisKey = 'website:' + req.originalUrl;

    RedisClient.get(redisKey, (errRedis: any, resultRedis: string) => {
      if (resultRedis) {
        res.send(resultRedis);
        return;
      }

      res.render('../public/index.html', {
        req,
        res
      }, (error, value) => {
        if (error) {
          return (req as any).next(error);
        }
        res.send(value);

        if (!error) {
          if (res.statusCode === 200) {
            RedisClient.set(redisKey, value, 'ex', 60 * 5);
          }
        }
      });
    });
  });

We also created a warmer so that our homepage is always cached (we created a "fake" server and calling localhost from a different node server).

patrickmichalina commented 7 years ago

If anyone is curious about Varnish, it does work, but takes some effort.

Our solution involves the use of Varnish and creating an entity tag map of all the resources involved in the various XHR requests that made the page (articles, videos, events, etc). The page is rendered with an X-Cache-Tags: Article-1,Article-2,Video-1. When an item is changed in the CMS, varnish is informed and busts that tag.

m98 commented 7 years ago

@doggy8088, setting the Varnish is a different process.

But about how it works, you don't need to run an spider to catch all your pages. When a user wants to see one of your page and the browser makes a request, then the Node will run Angular Universal and it will tuces you the rendered version of your app, Varnish will cache this request (means to saving the HTML rendered) and the next request, instead it will ask Varnish for the cached page.

For removing the cache, let's say when something changes in your page by the CMS, you can delete an special cache by the defined tag. (You can add tags to caches)

Toxicable commented 7 years ago

After discussing this feature a bit more in depth we've decided that this is not something that the Universal Engines should handle. Since there are many complexities to controlling cache validation and how it might be different throughout different platforms I would suggest using a cache provided by the platform you're using, like Varnish for example. We will still be doing a factory cache so that your app wont have to be recompiled but we will not handle output caching.

intellix commented 7 years ago

Basic local cache example for anyone looking for a cheap trick:

const cache = {};
app.get('*', (req, res) => {
  const url = req.url;
  const now = new Date();

  if (cache[url] && now < cache[url].expiry) {
    return res.send(cache[url].html);
  }

  res.render(join(DIST_FOLDER, 'browser', 'index.html'), { req }, (err, html) => {
    // Expire in 1 month
    const expiry = new Date();
    expiry.setDate(expiry.getDate() + 30);

    cache[url] = { expiry, html };
    res.send(html);
  });
});

NgxDev commented 6 years ago

this is not something that the Universal Engines should handle. Since there are many complexities to controlling cache validation and how it might be different throughout different platforms I would suggest using a cache provided by the platform you're using, like Varnish for example.

Unfortunately, with a caching mechanism outside of the Universal/Angular world, I don't know how we could have parts of a page that are dynamic (bypass the cache), while having the rest of the document cached. Things that aren't the most trivial in php either, for example, but quite possible without too much hassle.

For example, right now I am playing with http cache through Nginx, which sits in front of my Node server that does SSR. But, for each page (route), I have 2 cached versions with only one tiny difference between them:

one version with login/signup buttons in the header
another version with a generic user avatar (the dropdown menu would only be populated client-side since it's not visible unless the avatar is clicked anyway)

Why cache both versions of the output? Because a) I want google to see those login/signup links and b) I don't want the user to notice any difference between the server-rendered view and the view it's replaced with, after the app bootstraps on the client. So, like ~99% of the page is the same. And just because of that tiny difference, I need to have each page cached twice (based on a cookie that tells if there is someone logged in or not, doesn't matter who). I couldn't think of any other solution. So, if you have like 100k pages, that would mean 200k documents cached instead of 100k, because of one tiny difference. But if the above isn't reason enough (although I think it should be), imagine that - in that first view, rendered on the server - you'd have to also show some info that would differ from user to user. Like the user's actual avatar, uploaded by the user, or the user name. Well, that would mean you'd end up caching "*(100k + 1) Number of users**" documents.

I don't know if any outside caching mechanism could handle this "dynamic bits and pieces" scenario, since it's the Universal/Angular app that would know all those details. Right? Like "Oh, this component (ie a nav header that is different from user to user) has to bypass the cache (stuff needs to be computed on each request, since it depends on the logged in user). And then we merge the result with the other parts, that were already cached since they're not dynamic and don't need re-rendering/re-computing of things".

I'd also like to say that I'm no expert, maybe rather a newb in this thing called "caching". But since as far as I can remember, whenever I had to do with a php project for example, if some part of the output was meant to be dynamic and bypass the cache, I was required to simply put that html block in a partial and include it like $this->renderDynamic('/some/partial.php') and that would bypass the cache. Sure, the ins and outs of this implementation would differ from framework to framework, but in my opinion, this scenario should be a requirement when talking about high traffic apps. We can't not use a caching mechanism and I think we also shouldn't just have tenths of millions of documents cached instead of 10k or 100k, just because of a few tiny differences from one to another.

Is this scenario somehow possible @Toxicable ? Having dynamic parts within a document, instead of having a completely different (but almost the same) cached version because of a few tiny differences? I sure hope it is, I hope to realize that there is a solution which I've failed to see either because I just didn't think about it enough or because I haven't put too much time into research, and stopped after a few reads, falsely believing it's not possible right now. If anybody has managed to achieve this, please share 😁

@patrickmichalina could this be achieved through Varnish? I mean, I guess Varnish is just something that would sit between the Node server that does SSR and the server that serves the app to the client (like Nginx for example). And, even if Varnish itself could be told that a part of a document is dynamic (ie <nav-header>(.*)?</nav-header>), it would still need to reach out to our Node server to get the dynamic part. But our Node server has no notion of that, it would just render the entire document again and again, the very thing we are trying to avoid through caching. Right? Even if you could tell Varnish how to handle that, and set up routes in your Angular app to also render each dynamic component individually, just for Varnish (or any caching mechanism), I guess it would still be a very difficult configuration to do... ? And we'd have 2 requests instead of 1, per document, before it is cached (because now there would be an additional request for like /user-header). But I'm talking silly I think anyway, when I'm thinking of having routes for the sole purpose of rendering dynamic parts of the app, just for a caching mechanism. There has to be a better way, a way that our app itself should be aware of.

robert-king commented 5 years ago

@Toxicable I see you've closed this but I'll share my use case in case it's of interest to you or anyone else:

I have a CMS server where articles and pages are updated. I have API servers and Frontend Servers behind a traffic manager. Universal runs on the Frontend servers.

Typically you have triples like this

(PageHtml, PageGUID, PageURL).

To keep URLs readable and clean, it's decided not to have GUID in the URL, but slugs which represents what the content is. (e.g. category slugs, article slugs etc)

This creates a disconnect for caching. To solve this we require a URL <-> GUID bridge. From a URL we can get a GUID, from a GUID we can get a URL. (Sometimes this could be a many to many relationship).

That way, when in universal we can look against a GUID to see when its content was changed and decide to rebuild, OR, in a backend system, we can change data associated with a GUID, and then using the bridge, find the URL and delete HTML cached against that URL.

Since angular universal components know about latest routing structure and their URL and also can know about GUIDs through their HTTP requests, It's candidate of where to create the URL GUID bridge from.

In server.ts, I have access to my redis client and the html, res.render('index.html', { req }, (err, html) => {})

I could parse the html to get the GUIDs, (e.g. from the transfer state script), However is there an easier way to pass the GUIDS from the angular component out into the express http response handler?

angular-automatic-lock-bot[bot] commented 5 years ago

This issue has been automatically locked due to inactivity. Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

angular / universal