galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.39k stars 1k forks source link

Cacheability of index page #8117

Open hexylena opened 5 years ago

hexylena commented 5 years ago

The current index page cannot be cached because it embeds user information and a session CSRF token.

UseGalaxy.eu noticed that this could be a problem recently due to a patch from a student that inflated the size of the /api/tools response. And if it weren't for the user information embedded in the home, we would be able to cache this page heavily and not worry about re-implementing it. For us, the current home page comes in at:

$ du -h index.html
19M     index.html

While it is exacerbated to the point of frustration by this student patch that we will rework, in general surely it would be better that we are able to cache this extremely important first load page? And MOST of the information on that page is generic enough (/api/configuration, /api/tools), it is just the user-specific ones.

cc @bgruening

dannon commented 5 years ago

Our goal is definitely to make the initial page load as small and fast as possible. Honestly I'd like to make it small enough that you don't want to bother caching it. I don't off the top of my head remember if there's a good reason for not fetching the toolbox asynchronously, or if it's just historical, but pushing that to a separate request (that could itself be cached) instead of being in the bootstrapped data would be a really great first step.

mvdbeek commented 5 years ago

Could this be a configuration issue ? All the static things come from the browser cache in our instance:

Screen Shot 2019-06-07 at 17 07 40
dannon commented 5 years ago

@mvdbeek I think it's that we bootstrap the entire toolbox and a ton of extra information about it into the variables filled into the mako template. Check usegalaxy.org's source and look for the options var, specifically the toolbox, etc.

So, it's nowhere near the 19MB insanity up there, but for example on main I do see a 2.12 megs just in the initial HTML doc, which is too big. The toolbox is the vast majority of this.

martenson commented 5 years ago

The page with the embedded toolset has 214KB on Main.

edit: I did not see the edit above, not trying to defend the architecture, which can be much improved, but the actual size of even Main-sized request is reasonable.

dannon commented 5 years ago

@martenson I was referring to the raw response size (which is 2.12MB), not the gzip'd transfer amount.

mvdbeek commented 5 years ago

Right, that would surely be a good idea. But on the deployer side I think a lot can be gained by adding a http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache I've got

        location / {
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                proxy_cache my_cache;
            add_header Accept-Ranges "bytes";
            proxy_cache_key "$scheme$proxy_host$request_uri $http_range";
            proxy_set_header Range $http_range;

            proxy_set_header Referer        "https://url";
            proxy_set_header Host           "url";
                proxy_pass_request_headers on;
                proxy_pass http://127.0.0.1:7777/;
                proxy_set_header HOST $host;
                proxy_set_header X-URL-SCHEME https;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_set_header Host $host;
        }

        location /api/genomes {
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                add_header cache-control "max-age=300, public";
                proxy_cache my_cache;
                proxy_pass http://127.0.0.1:7777/api/genomes;
                proxy_set_header HOST $host;
                proxy_set_header X-URL-SCHEME https;
        }
        location /api/datatypes {
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                add_header cache-control "max-age=300, public";
                proxy_cache my_cache;
                proxy_pass http://127.0.0.1:7777/api/datatypes;
                proxy_set_header HOST $host;
                proxy_set_header X-URL-SCHEME https;
        }
        location /api/webhooks {
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                add_header cache-control "max-age=300, public";
                proxy_cache my_cache;
                proxy_pass http://127.0.0.1:7777/api/webhooks;
                proxy_set_header HOST $host;
                proxy_set_header X-URL-SCHEME https;
        }

together with a CDN that makes a huge difference

mvdbeek commented 5 years ago

so for things that barely ever change like genomes, datatypes, webhooks we're not even hitting Galaxy anymore

dannon commented 5 years ago

@mvdbeek Yeah, that's ideal. We should be able to push tool information there, too.

martenson commented 5 years ago

@erasche the size of eu's index is an issue (~4MB transfer) but the substantial slowness seems to be from elsewhere, it takes ~5s to get response for that request which is too high for just a bandwidth problem.

Also given you actually use webhooks we should have a look at https://github.com/galaxyproject/galaxy/issues/5565 since just the index loads them three times and does not cache.

Galaxy___Europe

edit: I also got 'backend unresponsive' error few times when testing

Also your live stats data are at least 5MB and noncacheable.

hexylena commented 5 years ago

Thanks for the discussion everyone! We'll definitely start caching webooks and those api routes.

live stats data are at least 5MB and

Ugh. Guess we can just remove that.

5s to get a response

It's also partially the patch, which added the tool help to /api/tools, and caused the response size of that route to increase 300% + take WAY longer to render. I've removed that patch but the changes aren't live yet. Removing that will make it bearable again.

martenson commented 5 years ago

live stats data are at least 5MB and

Ugh. Guess we can just remove that.

You could switch to server-side rendered static image (https://grafana.com/docs/reference/sharing/)

hexylena commented 5 years ago

Honestly I'd like to make it small enough that you don't want to bother caching it.

That sounds great @dannon, then think all we have to do is really remove the toolbox, it's tiny therwise.