Open desb42 opened 5 years ago
Yeah, this could be difficult. First, I believe that the srcset attribute is lumped on to every [[File:]] image. So supporting srcset would probably affect the vast majority of images in a wiki
As a rough guess, I say that 1.5 versions would scale out the image databases to 2x (100 GB -> 200 GB). This is because a 10 x 10 image of 100 pixels would become a 15 x 15 of 225 pixels. There are a few other variables (compression quality; originals not affected), but the end result would probably be around there.
I have a poor eye with regards to these details. How significant do you feel the difference is from a user-experience? I usually focus on text, and the images just offer me some approximate data. I also tend to use XOWA mostly on Android nowadays, and the image quality there feels sufficient to the mobile viewport.
Regarding your questions, there is some fragmented documentation in the comments of http://xowa.org/home/wiki/Dev/Command-line/Dumps . I list more details below.
But I don't think this would be an easy fix. The main problem is that XOWA uses a composite-key in the HTML to fetch the data from the file-databases. Any change to include srcset would involve changing some of that composite-key logic which has a good deal of complexity to it.
At a high-level, this is what happens.
So, the "easiest" thing would be
However, this is just pie-in-the sky design. I'm sure there's a lot more complexity when the details are done.
* add ('simple.wikipedia.org' , 'wiki.mass_parse.exec'):
// aggregate the lnkis
add ('simple.wikipedia.org' , 'file.lnki_regy');
Takes all the rows in lnki_temp and aggregates them into lnki_regy. For example, A.png with width 300 may be used by 5 pages in link_temp; we only want 1 row of A.png with width 300 here
// generate orig metadata for files in the current wiki (for example, for pages in en.wikipedia.org/wiki/File:*)
add ('simple.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
Generates a complete list of images. Primarily for redirects.
// generate all orig metadata for all lnkis
add ('simple.wikipedia.org' , 'file.orig_regy');
// generate list of files to download based on "orig_regy" and XOWA image code
add ('simple.wikipedia.org' , 'file.xfer_temp.thumb');
thumb
, 300px
, upright=1.5
, 400x200px
(which may not actually yield a 400x200px image depending on the original dimensions)// aggregate list one more time
add ('simple.wikipedia.org' , 'file.xfer_regy');
thumb
, 300px
, upright=1.5
, 400x200px
// identify images that have already been downloaded
add ('simple.wikipedia.org' , 'file.xfer_regy_update');
// download images. This step may also take a long time, depending on how many images are needed
add ('simple.wikipedia.org' , 'file.fsdb_make') {
Thanks for the extensive answer
From a general user perspective, I would guess this to be very minor issue. This is intended to be an offline system where space is a consideration (unlike wikicommons) I would not suggest that you put anymore effort into this. I might have a play though
As an example I compared the enwiki and xowa http-server version The page I picked is a random example to try to make my point en.wikipedia.org/wiki/Argentine_Declaration_of_Independence
You will see (possibly) that the enwiki version is sharper. On tracking down this difference I note that enwiki uses the relatively new 'srcset' feature of the \ tag to pick an 'appropriate' image for my screen. In my case that is the thumb image that is 1.5 larger than the defined width and height. To try to put this another way the thumbnail that is 1.5 times bigger is squeezed into the defined width and height
I realise that trying to keep all versions of these thumbnails offline is defeating the purpose.
I am not suggesting any change here
What I would like to do is experiment with generating the 1.5 sized versions in place of the 'real' version in my own builds (to see the overall impact on size of the image database(s)) I am hoping you can give me some pointers into your code as to how to do this. for example is it part of the file.page_regy step and/or part of file.fsdb_make step?