MatmaRex / patchdemo

This repository has been moved to GitLab: https://gitlab.wikimedia.org/repos/ci-tools/patchdemo
https://gitlab.wikimedia.org/repos/ci-tools/patchdemo
MIT License
25 stars 21 forks source link

Chinese-language (zh) wikis take way too much disk space #554

Closed MatmaRex closed 1 year ago

MatmaRex commented 1 year ago

We're running out of disk space on / again. Apparently this is because Patch demo has been very helpful for testing changes to the interface of language variants ;)

There are a few wikis using the zh language code with databases around 150 MB, compared to the average of 30-35 MB:

root@patchdemo3:/var/lib/mysql# du -sh * | sort -n... 14M patchdemo_23ecabceb7 14M patchdemo_3af60fecc0 14M patchdemo_53fcd69df9 14M patchdemo_6b0bae6bd4 14M patchdemo_6fb8116b28 14M patchdemo_80d266d75f 14M patchdemo_81e70bac92 14M patchdemo_bc77e3d16d 14M patchdemo_dd14c80fda ... 97M patchdemo_3035c979ca 98M patchdemo_f18e1e5ec5 106M patchdemo_1caf0f3eea 112M patchdemo_dcf3b89afb 116M patchdemo_b55fb16cd2 141M ibdata1 154M patchdemo_c08839db19 158M patchdemo_916be355b2 158M patchdemo_93cb3acc53 158M patchdemo_f1d917c9cc 159M patchdemo_b371c53547

They are huge because all of the data in l10n_cache is duplicated for each language variant:

MariaDB [patchdemo_f18e1e5ec5]> select lc_lang, lc_key, length(lc_value) from l10n_cache order by 3 desc limit 100;+---------+-----------------------------------------------------------+------------------+ | lc_lang | lc_key | length(lc_value) | +---------+-----------------------------------------------------------+------------------+ | en | list | 749314 | | zh | list | 749314 | | zh-cn | list | 749314 | | zh-hk | list | 749314 | | zh-mo | list | 749314 | | zh-my | list | 749314 | | zh-sg | list | 749314 | | zh-tw | list | 749314 | | zh-hans | list | 749314 | | zh-hant | list | 749314 | | zh | deps | 204939 | | zh-mo | deps | 204939 | | zh-my | deps | 204741 | | zh-hk | deps | 179704 | | zh-tw | deps | 179704 | | zh-hant | deps | 179704 | | zh-sg | deps | 179506 | | zh-cn | deps | 154271 | | zh-hans | deps | 154271 | | en | deps | 50832 | | zh | specialPageAliases | 28397 | | zh-hk | specialPageAliases | 28397 | | zh-mo | specialPageAliases | 28397 | | zh-tw | specialPageAliases | 28397 | | zh-hant | specialPageAliases | 28397 | | zh-cn | specialPageAliases | 27553 | | zh-my | specialPageAliases | 27553 | | zh-sg | specialPageAliases | 27553 | | zh-hans | specialPageAliases | 27553 | | zh | magicWords | 18531 | | zh-cn | magicWords | 18531 | | zh-hk | magicWords | 18531 | | zh-mo | magicWords | 18531 | | zh-my | magicWords | 18531 | | zh-sg | magicWords | 18531 | | zh-tw | magicWords | 18531 | | zh-hans | magicWords | 18531 | | zh-hant | magicWords | 18531 | | en | magicWords | 11594 | | en | specialPageAliases | 11032 | | zh-tw | preload | 7667 | | zh-hant | preload | 7648 | | zh-hk | preload | 7564 | | zh-mo | preload | 7564 | | zh | preload | 7557 | | en | preload | 6728 | | zh-cn | preload | 6691 | | zh-my | preload | 6691 | | zh-sg | preload | 6691 | | zh-hans | preload | 6691 | | zh | messages:citethispage-content | 3744 | | zh-cn | messages:citethispage-content | 3744 | | zh-my | messages:citethispage-content | 3744 | | zh-sg | messages:citethispage-content | 3744 | | zh-hans | messages:citethispage-content | 3744 | | zh-hk | messages:citethispage-content | 3575 | | zh-mo | messages:citethispage-content | 3575 | | zh-tw | messages:citethispage-content | 3575 | | zh-hant | messages:citethispage-content | 3575 | | en | messages:citethispage-content | 3310 | | en | preloadedMessages | 3166 | | zh | preloadedMessages | 3166 | | zh-cn | preloadedMessages | 3166 | | zh-hk | preloadedMessages | 3166 | | zh-mo | preloadedMessages | 3166 | | zh-my | preloadedMessages | 3166 | | zh-sg | preloadedMessages | 3166 | | zh-tw | preloadedMessages | 3166 | | zh-hans | preloadedMessages | 3166 | | zh-hant | preloadedMessages | 3166 | | en | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-cn | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-hk | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-mo | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-my | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-sg | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-tw | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-hans | messages:cite_references_link_many_format_backlink_labels | 2089 | | zh-hant | messages:cite_references_link_many_format_backlink_labels | 2089 | | en | messages:default-skin-not-found | 1744 | | zh-hk | messages:default-skin-not-found | 1673 | | zh-mo | messages:default-skin-not-found | 1673 | | zh-tw | messages:default-skin-not-found | 1673 | | zh-hant | messages:default-skin-not-found | 1673 | | zh | dateFormats | 1646 | | zh-hk | dateFormats | 1646 | | zh-mo | dateFormats | 1646 | | zh-tw | dateFormats | 1646 | | zh-hant | dateFormats | 1646 | | zh | messages:default-skin-not-found | 1638 | | zh-cn | messages:default-skin-not-found | 1638 | | zh-my | messages:default-skin-not-found | 1638 | | zh-sg | messages:default-skin-not-found | 1638 | | zh-hans | messages:default-skin-not-found | 1638 | | zh | namespaceAliases | 1521 | | zh-hk | namespaceAliases | 1521 | | zh-mo | namespaceAliases | 1521 | | zh-tw | namespaceAliases | 1521 | | zh-hant | namespaceAliases | 1521 | +---------+-----------------------------------------------------------+------------------+ 100 rows in set (0.189 sec)

I'm not sure whether this is expected, or a bug in MediaWiki.

MatmaRex commented 1 year ago

I cleared out some disk space using:

journalctl --vacuum-size=100M

(Unrelated to the databases, there was just 1.5 GB of logs in /var/log/journal.)

And:

root@patchdemo3:/var/lib/mysql# find * -name 'patchdemo_*' -exec mysql -e "truncate l10n_cache;" {} \;

(Gained about 4 GB, but this is temporary until someone visits the wikis. Maybe we should do this in a cron job though…)

edg2s commented 1 year ago

I think long term we need a wiki expiration policy (aka auto deleting) if we are to reduce the maintenance burden. Today it's a wiki using 5x the space, but tomorrow it could be 10x teams using the service.

MatmaRex commented 1 year ago

One blunt way to deal with this problem is to disable the use of the cache:

$wgLocalisationCacheConf['storeClass'] = 'LCStoreNull';

I just realized that I've been using this setting locally for who-knows-how-many years, and my local testing wiki feels fast. Maybe that's an option.