Open desb42 opened 5 years ago
Under circumstances I do not understand, an original image is downloaded and an attempt to convert it to 0x0 size
Yeah, this looks like a bug.
[[File:A.png|0x0]]
This should be a simple change. I'll also check the databases on my side.
There were also 517 failed downloads - is there anyway of telling from which page(s) these came from
Yup. Check xowa.file.make.sqlite3
and run the following SQL:
SELECT lnki_page_id FROM lnki_temp WHERE lnki_ttl = 'YOUR_IMAGE.PNG';
So it looks like the above file (Women_working...) is the only instance of a no-op file:
SELECT * FROM lnki_temp WHERE lnki_w = 0 AND lnki_h =0 AND lnki_upright = -1 AND lnki_time = -1 AND lnki_page = -1;
Furthermore, it exists on this page: https://de.wikipedia.org/w/index.php?title=D._Stempel which uses it as:
[[Datei:Women working a factory in David Stempel AG in 1918 during world war one.jpg|alternativtext=Frauen arbeiten in der David Stempel AG während des 1. Weltkrieges, 1918|mini|0x0px|Frauen arbeiten in der David Stempel AG während des 1. Weltkrieges, 1918]]
Apparently, MediaWiki ignores the 0x0 argument. I'm going to track down this code later. However, as the impact is pretty low, I'm bumping this down in priority.
After manually updating a wiki, how does one download new images and locally create new image dumps to add to the last published image dump, e.g. Xowa_enwiki_2018-07file*.zip? I use Xowa (a simply amazing and awesome product) on a stand-alone network, so grabbing images while browsing is not an option.
how does one download new images and locally create new image dumps
There really is only one way and it is quite complicated. See: http://xowa.org/home/wiki/Dev/Command-line/Dumps . I can walk you through it, but it requires quite a bit of work (I think @desb42 has managed to get through one enwiki cycle on his own)
Ordinarily, I try to provide updated copies of English Wikipedia. But I've been late on my side, though I keep saying that a new update is just around the corner....
I use Xowa (a simply amazing and awesome product) on a stand-alone network
Also, just want to say, thanks for the compliment!
I am simply awed by the work you put into xowa. That you have time to do any data updates is amazing. Thanks for the link. It looks quite helpful, although it will take time for me to digest it. The process is a bit surprising to me; I would have guessed that the image links would simply be in one of the wikimedia dumps.
My plan was to build a 2019-04-01 en.wikipedia.org and en.wiktionary.org from wikimedia dumps, use download central to add in the 2018-07 images, and then figure out how manually add the image changes between 2018-07 and 2019-04 (which seems more possible now with the link you provided).
Can I expect that most of the 2019-04-01 wiki will work and look good with the 2018-07 image dumps, or would I do better to stick with your 2018-07 wiki articles dump until I can get the manual image dump process working?
The process is a bit surprising to me; I would have guessed that the image links would simply be in one of the wikimedia dumps.
The image links could work, but it would download the original image whereas most articles use thumbs. Although that's useful in and of itself, this would easily use 400-500 GB. Moreover, you'd need a way to convert them into thumbs for the article.
My plan was to build a 2019-04-01 en.wikipedia.org and en.wiktionary.org from wikimedia dumps, use download central to add in the 2018-07 images, and then figure out how manually add the image changes between 2018-07 and 2019-04 (which seems more possible now with the link you provided).
That's generally how I build my updates: take the base (2018-07) and add in the incrementals (everything up to 2019-04)
Can I expect that most of the 2019-04-01 wiki will work and look good with the 2018-07 image dumps, or would I do better to stick with your 2018-07 wiki articles dump until I can get the manual image dump process working?
2019-04 should look ok with 2018-07. There will be about 5% - 10% of images which are missing, but I don't think it will be that noticeable
Having successfully done the first step in generating html from a xowa build for dewiki ('wiki.mass_parse.exec' step - took approx 18h) I decided to complete the process to download the thumbnails the main step being 'file.fsdb_make' with a number of setup steps the whole group of steps took 9h 21m Reviewing the console log, I came across the following sequence
Under circumstances I do not understand, an original image is downloaded and an attempt to convert it to 0x0 size (21 times) is made
There was one other of similar sequence
Luca_Carlevarijs_(Italian_-_Regatta_on_the_Grand_Canal_in_Honor_of_Frederick_IV,_King_of_Denmark_-_Google_Art_Project.jpg
In this case trying to convert to a size of 9568x5161 (original is 9775x5273) This, in its original size is 14MB - not what I would call a small fileThere were also 517 failed downloads - is there anyway of telling from which page(s) these came from