Open avdi opened 2 weeks ago
It seems like this plugin adds ?activitypub
and /activitypub
rewrites as alternates for the Accept
header. Is there any way to have it use those links everywhere in the generated feeds, instead of using the original post/author/comment permalinks? I think this would effectively route around the problem, since cache layers would then see all activitypub requests as being for distinct resources.
I'm also wondering if I can accomplish something similar in .htaccess. Has anyone else encountered this issue?
So, I guess this is essentially #580. I'm a little surprised that sending the Vary
isn't the default; you can't have content-negotiation and caching without Vary: Accept
. And you can't do fediverse without caching.
But as was discussed in #580, all the common WordPress internal page-caching plugins [bafflingly] don't honor Vary
anyway. I'd really like to know who with a large follower count is using this in production ... and how??
I am currently traveling... I will answer all your questions when I arrive at the WCEU ☺️
You can try WP Super Cache or Cachify in the meantime. They should both support content-negotiation.
Hey, thanks a lot for the reply! I've used Super Cache before, but of course it's the one I didn't try yesterday 😅 (Also, when did it become an official Automattic product?? Somehow I didn't realize that...)
I tried it (successfully), and I also took a look at Cachify, and it looks like they both work to the degree of simply not caching requests for non-HTML content. Which is definitely a step up, but still leaves me scratching my head over what to do when the flock of fediverse seagulls descends with all their feed requests.
Given that none of the caching plugins will actually cache AP content separately, I'm strongly considering going back to my original caching solution and putting in a mod_rewrite rule to redirect anything with an Accept
header containing application/activity+json
to the .../activitypub
variant path. And then excluding that pattern from caching. Curious if anyone has had any success with his approach.
Anyway @pfefferle safe travels, and I'll look forward to your elaboration!
@avdi for non-html requests you can use WP REST Cache plugin.
@avdi for non-html requests you can use WP REST Cache plugin.
I did not know about this plugin, thank you!
Just adding to the list of "supported" caching plugins: Surge can also be set up to separately deal with different Accept headers.
I don't think it'll cache REST API responses, though. Guess you might be able to use WP REST Cache for those. (But you'll still want to also cache ActivityPub [well, and HTML] responses for /author/<name>
or whatever you use and individual post URLs!)
Saying this as someone who messed around with this a little while back, although I must admit that I have personally switched to "microcaching" using NGINX's fastcgi_cache
, both for HTML and AP and (certain) REST API responses, and that it seems to work well enough even without any (page) caching plugins.
Just circling back here because I got 18 boosts on a post and now my server is once again 100% pegged as it gets thousands and thousands of fediverse requests for the same post all at once. (P.S. how are there THIS many Mastodon servers?!?!)
How are most people handling this? Are you using customized Nginx or Varnish configs downstream to cache AP content? I do have Nginx and Varnish, but my host controls the configuration and it seems like with the headers AP is delivered with out-of-box, the cache layers are ignoring it.
I'd really love insight into how to scale AP with WordPress!
@avdi I added some more resources to: https://github.com/Automattic/wordpress-activitypub/wiki/Caching
The "I Stopped Mastodon DDoSing Me (I Think)" Article from @kevquirk is worth having a look!
@avdi I added some more resources to: https://github.com/Automattic/wordpress-activitypub/wiki/Caching
The "I Stopped Mastodon DDoSing Me (I Think)" Article from @kevquirk is worth having a look!
Thank you for the links! The links to the relevant PRs are really nice.
Two notes:
1) I got my hopes up, but the article from @kevquirk isn't relevant, unfortunately. It's about a non-Fediverse blog getting pummeled by requests for (HTML) pages when it was linked on a Mastodon account with many followers. So, it's a useful study in making regular HTML pages cacheable, but it's not applicable to WordPress serving AP with content-negotiation. 2) The Cache plugins listed are only patched so far as to not break ActivityPub content negotiation, but only by ignoring AP requests. So they won't address the thundering-herd-of-Mastodons problem.
I'm intrigued, however, by the last link, customizing Surge to serve different variants. This is the only in-WordPress solution I've seen so far.
The Cache plugins listed are only patched so far as to not break ActivityPub content negotiation, but only by ignoring AP requests. So they won't address the thundering-herd-of-Mastodons problem.
Good point! I am currently experimenting a bit with Surge, let's see how that works out.
Quick summary
Hello,
I'm looking for some insights. This may not be a bug with the plugin per se, but I have not found any workarounds so far, and I'm curious what others have done.
I'm hosting a site on CloudWays, which provides Varnish caching. I have tried three different recommended caching plugins, using the CloudWays-recommended settings:
In every case as long as the caching plugin is enabled, I see either the HTML version or JSON version of posts get locked-into cache and "win" based on whichever one was requested first. NONE of these caches seem to respect content-negotiation. In fact having skimmed through the code of these plugins, they seem to go out of their way to disable it by overriding the
Vary
header!Unfortunately, with the fediverse being architected the way it is, effective caching is a must-have. With a thousand or so followers and without caching enabled, I see my 32gb Vultr instance grind to a halt every time I post something, as I get inundated with feed requests.
I feel like surely someone else must have encountered this and come up with a solution.
Steps to reproduce
Accept: application/activity+json
Accept: text/html
Content-Type: text/html
)Accept: text/html
Accept: application/activity+json
What you expected to happen
Get separately cached versions for HTML and ActivityPub JSON
What actually happened
Get whatever content-type version of the resource happened to get cached first
Impact
All
Available workarounds?
No but the platform is still usable
Logs or notes
No response