CanastaWiki / Canasta

MediaWiki Docker image for Canasta, an all-in-one MediaWiki stack for easy deployment and management of enterprise-ready MediaWiki on production environments.
https://www.canasta.wiki
MIT License
36 stars 27 forks source link

Add relevant wikiteq branch changes to master #388

Open yaronkoren opened 2 months ago

yaronkoren commented 2 months ago

There have been many changes to the wikiteq branch since it was created about seven months ago; and though there has been an effort at synchronizing the wikiteq and master branches by myself and others, there certainly remain differences between the two - which it would be best to avoid, if possible. I went through to find changes to the wikiteq branch that it could make sense to add to the master branch in one form or another, and found the following five. They range in size from simple one-line changes (#332, #375) to a massive change to 19 files (#371), which despite its name seems to do much more than move the location of the log files - actually, much of that patch seems to itself be an attempt at synchronization, so it's probably not as big as it appears.

It would be good to identify which of these changes it makes sense to copy over - and ideally then copy those over.

EDIT: I figured it would be good to put together, in one place, a checklist of all the tasks to be done. So, here it is - also with some improvements to the task descriptions over what is listed below:

To do:

Maybe:

jeffw16 commented 2 months ago

My thoughts:

yaronkoren commented 2 months ago

I looked more into that giant #371 patch, and it's actually a combination of a whole bunch of things:

  1. making log files into volumes (as advertised)
  2. making MediaWiki's cache directory into a volume
  3. addition of the --variableArrayIndex flag to getMediaWikiSettings.php, as part of some refactoring that I can't fully understand
  4. making SMW's config/ directory into a volume
  5. the addition of wiki farm support from the master branch
  6. the addition of various functionality from Taqasta, including: a. SQLite support b. addition of the waitelastic() function, which pauses setup until Elasticsearch starts running c. making the Widgets extension's compiled_templates/ directory into a volume d. adding some SMW maintenance scripts, like updateEntityCollation.php (this SMW stuff should really be added as a separate script to the maintenance-scripts/ directory, I think) e. adding some CirrusSearch maintenance scripts, like UpdateSearchIndexConfig.php (same) f. making sure that various directories are writeable, via the new function make_dir_writable() g. some code refactoring, like moving some (heavily modified) code into the new files functions.sh and run-maintenance-scripts.sh h. addition of the --memory-limit flag (with a value of 512M) when calling runJobs.php i. changing the call a2enmod expires to a2enmod expires remoteip j. making various messages during setup more verbose k. conversely, piping other messages to log files ("mwjobrunner_log", etc.) instead of printing them on the screen l. adding more PHP error logging (although some of it looks non-Taqasta) m. modified the Apache config file mediawiki.conf in a way I don't really understand n. some other code changes and cleanup that also look like they came from Taqasta

Clearly this PR was poorly named, and it probably should have been split into 10-20 PRs (!), which would have made it easier to evaluate each change separately.

Anyway, assuming that this outline is roughly correct, are there any thoughts on which parts should or should not be added to Canasta? (Apart from # 5, which is there already.) Personally, I don't strongly object to any of it, though there are parts that look more useful than others, and there are changes that it would definitely be good to see an explanation for.

jeffw16 commented 2 months ago

My thoughts:

  1. Still love the idea
  2. This part I don't understand as much. Why would this be necessary? It seems more of a liability to volume this out.
  3. I'm not sure what this does.
  4. That's a good idea, as long as this is a no-op for wikis which don't enable SMW.
  5. N/A
  6. a. SQLite is not for production usage b. We can add that c. Same thoughts as 2nd point d. Agreed they should be added, but to maintenance-scripts/ e. Same as d f. Let's add it g. Sure, we can see about adding it h. Good idea i. Not sure what this does but okay j. Sure k. Pipe to where? l. Sure m. Will look into it later n. N/A

yaronkoren commented 2 months ago

@tosfos - after our discussion, I think the only question I'm really curious about, relating to these patches, is this one: what are these mediawiki.conf changes from March about?

pastakhov commented 2 months ago
RewriteCond %{DOCUMENT_ROOT}/.maintenance -f
RewriteRule .* - [R=503]

This part is not needed for Canasta and can be removed. In Taqasta it returns the 503 error while maintenance/install.php is running (Taqasta can install mediawiki from scratch automatically). The user should not see the web installer at that time.


######## Overwrite log format to include X-Forwarded-For if it is provided ########
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 10.0.0.0/8
RemoteIPInternalProxy 172.16.0.0/12
RemoteIPInternalProxy 192.168.0.0/16

LogFormat "%a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" docker

CustomLog "|/usr/bin/rotatelogs -c -f -l -p /rotatelogs-compress.sh -L /var/log/apache2/access_log.current /var/log/apache2/access_log_%Y%m%d 86400" docker

This config for the remoteip Apache module ("The module overrides the client IP address for the connection with the useragent IP address reported in the request header configured with the RemoteIPHeader directive.") This makes configuring Apache much simpler because you can use the client IP in the settings and don't worry about the headers. Aslo this config was changed to be able to use another header as the client IP, see https://github.com/CanastaWiki/Canasta/commit/b70c5b2b57c2f7575d074ea040f04fb422fc9ec3

For example how IPs of DOS attacker were banned in the Apache config: https://github.com/WikiWorks/Society-of-Exploration-Geophysicists/commit/7cf51ebb7362d04fc01fa1f2bb43ad42c261c848 we added APACHE_REMOTE_IP_HEADER=cf-connecting-ip environment variable to the web container, it overwrites the client IP with one provided by Cloudflare, so it is used in the log files and allows use in the .htaccess file as:

<RequireAll>
  Require all granted
  Require not ip 47.76.99.127
  Require not ip 47.76.209.138
</RequireAll>

This config works well in most cases, allows to write clear config and in case we need to use another header like cf-connecting-ip as the user IP, all we need is to define it in the APACHE_REMOTE_IP_HEADER variable.

yaronkoren commented 2 months ago

@pastakhov - thank you for that detailed explanation! Those last two changes do look helpful.

yaronkoren commented 1 month ago

By the way, I wonder if the ideal way to handle items 4 and 6c (turning directories in the SMW and Widgets extensions into volumes, so that they don't get wiped out during upgrades) is by adding to the YAML format (e.g. extensions.yaml) a parameter like "volumed directories:", so that each extension (or even skin) can declare its own volumed directories, rather than hardcoding these in Canasta. I do like the idea of making Canasta as "extension-neutral" as possible.

yaronkoren commented 1 month ago

I still think this could be a good idea, although maybe the parameter should be called something like "persistent directories", rather than "volumed directories", because the syntax should ideally not be Docker-specific.

yaronkoren commented 1 week ago

This was accidentally closed...

yaronkoren commented 1 week ago

Re-opening, again - this may have to be done quite a bit.