gnosygnu / xowa

xowa offline wiki application
Other
377 stars 40 forks source link

Image clarity #319

Open desb42 opened 5 years ago

desb42 commented 5 years ago

As an example I compared the enwiki and xowa http-server version The page I picked is a random example to try to make my point en.wikipedia.org/wiki/Argentine_Declaration_of_Independence image_diff

You will see (possibly) that the enwiki version is sharper. On tracking down this difference I note that enwiki uses the relatively new 'srcset' feature of the \ tag to pick an 'appropriate' image for my screen. In my case that is the thumb image that is 1.5 larger than the defined width and height. To try to put this another way the thumbnail that is 1.5 times bigger is squeezed into the defined width and height

I realise that trying to keep all versions of these thumbnails offline is defeating the purpose.

I am not suggesting any change here

What I would like to do is experiment with generating the 1.5 sized versions in place of the 'real' version in my own builds (to see the overall impact on size of the image database(s)) I am hoping you can give me some pointers into your code as to how to do this. for example is it part of the file.page_regy step and/or part of file.fsdb_make step?

gnosygnu commented 5 years ago

Yeah, this could be difficult. First, I believe that the srcset attribute is lumped on to every [[File:]] image. So supporting srcset would probably affect the vast majority of images in a wiki

As a rough guess, I say that 1.5 versions would scale out the image databases to 2x (100 GB -> 200 GB). This is because a 10 x 10 image of 100 pixels would become a 15 x 15 of 225 pixels. There are a few other variables (compression quality; originals not affected), but the end result would probably be around there.

I have a poor eye with regards to these details. How significant do you feel the difference is from a user-experience? I usually focus on text, and the images just offer me some approximate data. I also tend to use XOWA mostly on Android nowadays, and the image quality there feels sufficient to the mobile viewport.

Regarding your questions, there is some fragmented documentation in the comments of http://xowa.org/home/wiki/Dev/Command-line/Dumps . I list more details below.

But I don't think this would be an easy fix. The main problem is that XOWA uses a composite-key in the HTML to fetch the data from the file-databases. Any change to include srcset would involve changing some of that composite-key logic which has a good deal of complexity to it.

At a high-level, this is what happens.

So, the "easiest" thing would be

However, this is just pie-in-the sky design. I'm sure there's a lot more complexity when the details are done.


* add     ('simple.wikipedia.org' , 'wiki.mass_parse.exec'): 
// generate orig metadata for files in the current wiki (for example, for pages in en.wikipedia.org/wiki/File:*)
  add     ('simple.wikipedia.org' , 'file.page_regy')        {build_commons = 'n';}

Generates a complete list of images. Primarily for redirects.

// generate all orig metadata for all lnkis
add     ('simple.wikipedia.org' , 'file.orig_regy');
// generate list of files to download based on "orig_regy" and XOWA image code
add     ('simple.wikipedia.org' , 'file.xfer_temp.thumb');
// aggregate list one more time
add     ('simple.wikipedia.org' , 'file.xfer_regy');
// identify images that have already been downloaded
add     ('simple.wikipedia.org' , 'file.xfer_regy_update');
// download images. This step may also take a long time, depending on how many images are needed
add     ('simple.wikipedia.org' , 'file.fsdb_make') {
desb42 commented 5 years ago

Thanks for the extensive answer

From a general user perspective, I would guess this to be very minor issue. This is intended to be an offline system where space is a consideration (unlike wikicommons) I would not suggest that you put anymore effort into this. I might have a play though