edwardspec / mediawiki-aws-s3

Extension:AWS allows MediaWiki to use Amazon S3 (instead of the local directory) to store images.
https://www.mediawiki.org/wiki/Extension:AWS
GNU General Public License v2.0
42 stars 32 forks source link

Parsing/Rendering slow down when there's too many images in a page. #42

Closed MPThLee closed 3 years ago

MPThLee commented 3 years ago

If there's too many images(files) are in a page, The page rendering became very low, even Parsoid(Backend of VisualEditor) give up to parsing it. Why there's a too many images in a just one page? Well, Our wiki is some kind of card game wiki and We've page that collection of whole released cards in a one page.

image The reason is, Because a bunch of doGetFileStat() is called, it blocking many things.

image The result makes Real-Time to over a minutes. (Info: Tested page has ~700 images with fixed 96px * 96px resolution)

$wgAWSLocalCacheDirectory with low $wgAWSLocalCacheMinSize value(1024) didn't helped.

(how about make aggressive local cache thingy?)

edwardspec commented 3 years ago

It is indeed possible to make a cache for doGetFileStat(), and it would theoretically improve performance. It might not be sufficient for a page with 700 images to not timeout while rendering, though, since any non-cached queries would still be sent sequentially (it's MediaWiki core that decides when to call doGetFileStat). But it should be an improvement.

That cache won't even have stale data (since we can invalidate it for some image A if/when that image is uploaded/deleted/renamed), except when there are 2+ webservers that don't share the same memcached server(s).

MPThLee commented 3 years ago

Well, I was starting to use S3 for backup and cdn purpose. As Images became bottleneck on server side by low iops when that time.

BTW, I'm using redis but it's ok, right? I don't know the behind of MediaWiki's cache system.

edwardspec commented 3 years ago

MediaWiki can indeed use Redis instead of Memcached. They do the same thing.

edwardspec commented 3 years ago

Please try the following version: git clone -b cache-GetFileStat --depth 1 https://github.com/edwardspec/mediawiki-aws-s3.git AWS

... and let me know if it improves the situation.

MPThLee commented 3 years ago

This does a better performance than current master branch! 10~20 seconds are reduced. (Since sometime render does reaches 75 seconds depend on network status.) image

PS. Can I set Cache TTL? 24 hours are ok but I want to cache live more.

edwardspec commented 3 years ago

Will probably just set TTL to 7 days. Since the cache is properly invalidated, it is never stale and TTL doesn't need to be short.

Pages with ~700 images might also benefit from $wgResponsiveImages = false; (causes less thumbnails to be generated).

MPThLee commented 3 years ago

So, After using cache-GetFileStat branch, The page almost got 37 seconds to render as cache hits more. (in Real time usage).

The page that I used is all low resolution(~700 images), 96px Square Card Icons, So it won't get a benefit from $wgResponsiveImages = false;. Sadly.

MPThLee commented 3 years ago

Hi, It seems like it's related to MediaWiki bugs too. It checks 96px thumbnails of 96px images per each parse time :( (Same thing happens all times, It checks thumbnails of same size of original image) However, It's faster than without cache because valid responses are still cached properly I guess.

Related: https://phabricator.wikimedia.org/T280445

edwardspec commented 3 years ago

Merged. Thank you for suggesting this optimization.