gnosygnu / xowa

xowa offline wiki application
Other
376 stars 41 forks source link

HTML databases don't support Random Page and Categories. Also, Read-Only SQLite files are very slow #81

Closed Ope30 closed 7 years ago

Ope30 commented 8 years ago

Hey there, just wondering if you've removed the random article function? Because I no longer have it, did I miss something? After all, I noticed that it takes more time loading the pictures as it used to be. I have zero idea why. Is there any way to give it more RAM or something? Would it also be possible to get instant search results?

gnosygnu commented 7 years ago

Hi!

Hey there, just wondering if you've removed the random article function? Because I no longer have it, did I miss something?

Hmm... It is still there. I checked the following:

Are you missing the "Random article" link? Or are you clicking it and nothing happens? If the latter, can you send me the log at en.wikipedia.org/wiki/Special:XowaSystemData?type=log_session ?

wiki_random

After all, I noticed that it takes more time loading the pictures as it used to be. I have zero idea why. Is there any way to give it more RAM or something?

Nah, this really hasn't changed. Just a few questions:

If you can, try to send me the log. For example, I did the following:

20160901_035715.831 file.get: file=Earth_symbol.svg width=18 page=Earth
20160901_035715.847 file.get: file=The_Earth_seen_from_Apollo_17.jpg width=300 page=Earth
20160901_035715.853 file.get: file=Protoplanetary-disk.jpg width=220 page=Earth
20160901_035715.868 file.get: file=PhylogeneticTree,_Woese_1990.PNG width=220 page=Earth
20160901_035715.882 file.get: file=Earth2014shape_SouthAmerica_small.jpg width=220 page=Earth
20160901_035715.900 file.get: file=Earth-cutaway-schematic-english.svg width=220 page=Earth
20160901_035715.903 file.get: file=Tectonic_plates_(empty).svg width=220 page=Earth
20160901_035715.917 file.get: file=AYool_topography_15min.png width=220 page=Earth
20160901_035715.932 file.get: file=Earth_elevation_histogram_2.svg width=220 page=Earth
20160901_035715.935 file.get: file=ISS-40_Typhoon_Halong.jpg width=220 page=Earth
20160901_035715.951 file.get: file=MODIS_Map.jpg width=220 page=Earth
20160901_035715.965 file.get: file=Full_moon_partially_obscured_by_atmosphere.jpg width=220 page=Earth
20160901_035715.970 file.get: file=Structure_of_the_magnetosphere-en.svg width=220 page=Earth
20160901_035715.982 file.get: file=EpicEarth-Globespin(2016May29).gif width=220 page=Earth

Also, just so you know, more RAM wouldn't do anything. An SSD would definitely help, but a regular hard drive should be fine.

Hope this helps. Thanks.

Ope30 commented 7 years ago

I fixed it, they all were put on read-only mode, I didn't know that would affect the wikis at all. My bad. Thanks though. Regarding the random article, I'm missing it, such as "Navigation" and "Interaction". As well as categories, I can't see them neither. I checked if they even existed, they do, they're just not at the end of an article like always.

Here is the log regarding the random article: [changed to gist: https://gist.github.com/gnosygnu/c8f62d52a5d774fe55170727ee2886b2]

And here is the log regarding categories: [changed to gist: https://gist.github.com/gnosygnu/76d9a99946f8c60cc58bc2fd8a7a0460]

I hope these logs are the ones you need.

Edit: Nothing regarding the problems above, but when I click on the download icon to download several wikis, for example, commons.wikipedia (using the Import list function), it says that all dump servers are offline. Is that because they're currently "offline"? Or they're just outdated?

Edit 2: Does it matter whether you downloaded the wikis using a additional programm? I mean I did not use XOWA at all, I used XOWA to get the links, then download it using an download manager, IDM for example.

gnosygnu commented 7 years ago

I fixed it, they all were put on read-only mode, I didn't know that would affect the wikis at all. My bad. Thanks though.

Wow! Quite a catch. I tried now, and sqlite definitely is slower when the files are read-only.

Just some minor notes:

For now, I'm just going to say that XOWA requires read-write access. If someone requests a read-only environment (for example, DVD), then I'll probably have to send a ticket to the SQLite mailing list

Regarding the random article, I'm missing it, such as "Navigation" and "Interaction".

Ah, I see the problem here. This is an issue with HTML dumps. Both my tests had the Wikitext dumps available.

Let me try to get a fix in for this weekend or next. In the meantime, you can workaround it by adding a custom menu option. I'll post the instructions in the next comment

As well as categories, I can't see them neither. I checked if they even existed, they do, they're just not at the end of an article like always.

Unfortunately, this is a known issue for the HTML dump. I'll try to add a Category section for the 2016-09 English Wikipedia dump.

Edit: Nothing regarding the problems above, but when I click on the download icon to download several wikis, for example, commons.wikipedia (using the Import list function), it says that all dump servers are offline. Is that because they're currently "offline"? Or they're just outdated?

It probably means you're offline. Try the following:

https://dumps.wikimedia.org/,
http://dumps.wikimedia.your.org/,
http://wikipedia.c3sl.ufpr.br/,
http://ftp.fi.muni.cz/pub/wikimedia/

I suspect one of these is different. If they're both the same, then we'll have to troubleshoot from there

Edit 2: Does it matter whether you downloaded the wikis using a additional programm? I mean I did not use XOWA at all, I used XOWA to get the links, then download it using an download manager, IDM for example.

Nope. You can use any download manager you want. As long as you're downloading the correct urls, you're fine. Just make sure you're using the urls from the "info" button in Download Central. For example, for 2016-08 English Wikipedia, they are here: home/wiki/Special:XowaDownloadCentralInfo?task_id=96

Sorry for all the issues. Thanks a lot for reporting them!

gnosygnu commented 7 years ago

Instructions for adding "Random Page" to the XOWA Main Menu:

  add_spr;
  add_btn('gnosygnu.random', 'Random Page', 'r', '', 'app.api.nav.goto("Special:Random");');
add_grp_default('xowa.gui.menus.group.file') {
  add_btn_default('xowa.gui.browser.tabs.new_dflt__at_dflt__focus_y');
  add_btn_default('xowa.gui.browser.tabs.close_cur');
  add_spr;
  add_btn_default('xowa.gui.page.view.save_as');
  add_btn_default('xowa.gui.page.view.print');
  add_btn_default('xowa.app.exit');
}
add_grp_default('xowa.gui.menus.group.edit') {
  add_btn_default('xowa.gui.page.selection.select_all');
  add_btn_default('xowa.gui.page.selection.copy');
  add_spr;
  add_btn_default('xowa.gui.browser.find.show');
}
add_grp_default('xowa.gui.menus.group.view') {
  add_btn_default('xowa.gui.font.increase');
  add_btn_default('xowa.gui.font.decrease');
  add_btn_default('xowa.gui.font.reset');
  add_spr;
  add_btn_default('xowa.gui.page.view.mode_read');
  add_btn_default('xowa.gui.page.view.mode_edit');
  add_btn_default('xowa.gui.page.view.mode_html');
  add_spr;
  add_btn_default('xowa.gui.page.view.reload');
  add_spr;
  add_btn_default('xowa.gui.browser.prog_log.show');
}
add_grp_default('xowa.gui.menus.group.history') {
  add_btn_default('xowa.nav.go_bwd');
  add_btn_default('xowa.nav.go_fwd');
  add_spr;
  add_btn_default('xowa.usr.history.show');
}
add_grp_default('xowa.gui.menus.group.bookmarks') {
  add_btn_default('xowa.nav.wiki.main_page');
  add_spr;
  add_btn_default('xowa.usr.bookmarks.add');
  add_btn_default('xowa.usr.bookmarks.show');
}
add_grp_default('xowa.gui.menus.group.tools') {
  add_btn_default('xowa.nav.cfg.main');
  add_spr;
  add_btn_default('xowa.nav.setup.download_central');
  add_spr;
  add_btn_default('xowa.nav.setup.import_from_list');
  add_btn_default('xowa.nav.setup.import_from_script');
  add_spr;
  add_btn_default('xowa.nav.setup.maintenance');
  add_btn_default('xowa.nav.setup.download');
  add_spr;
  add_btn('gnosygnu.random', 'Random Page', 'r', '', 'app.api.nav.goto("Special:Random");');
}
add_grp_default('xowa.gui.menus.group.help') {
  add_btn_default('xowa.nav.help.help');
  add_btn_default('xowa.nav.help.xowa_main');
  add_btn_default('xowa.nav.help.xowa_blog');
  add_btn_default('xowa.nav.help.change_log');
  add_btn_default('xowa.nav.help.diagnostics');
  add_btn_default('xowa.nav.cfg.menus');
  add_spr;
  add_grp_default('xowa.gui.menus.group.system_data') {
    add_btn_default('xowa.nav.system_data.log_session');
    add_btn_default('xowa.nav.system_data.cfg_app');
    add_btn_default('xowa.nav.system_data.cfg_lang');
    add_btn_default('xowa.nav.system_data.cfg_user');
    add_btn_default('xowa.nav.system_data.cfg_custom');
    add_btn_default('xowa.nav.system_data.usr_history');
  }
  add_spr;
  add_btn_default('xowa.nav.help.about');
}
Ope30 commented 7 years ago

Yeah, read-only was just a option to make sure nothing is going to be modified, atleast in my opinion.

I didn't enable Web-Access, that was the problem. Thanks!

Regarding IDM, I'm just using it for faster internet connection.

Your code for the Random Page worked. Thanks alot. Could you also try to get a fix regarding the categories and random page on the 2016-08 wikis, such as the current en-de wiktionary, wikivoyage, wikiquote, wikisource, wikibooks, wikiversity and wikinews that I got using the Download Central(I just recognised that they're missing the random article/category as well.)? Because it is pretty stressful downloading all the wikis using IDM, you have to click on every single link possible, you know. That's how I did it and I don't want to do it again this month to be honest. I resolved to do it in 3 or 4 months. That would be very nice, I want to make it comfortable until I get there. :)

Edit: Regarding commons.wikipedia.org, could you in the future implement commons.wikipedia.org and wikidata into the Download Central? I'm downloading wikidata without IDM at the moment and it takes ages, lol. That'd be pleasurable for future downloads.

Edit 2: Apparently the download of wikidata just ended, then I opened a wiki (can't remember the name) and XOWA crashed. I opened it again and it came with a message saying that the download was not completed yet, it said I can delete it, or I can continue and XOWA may run into issues. I continued, then I opened an article to check whether it works or not. Wikidata works overall, but clicking on it at the Main Page doesn't do anything. I don't know if that's how it's supposed to be. I opened the task manager, I saw that XOWA was using alot of RAM probably because it was extracting it at the moment, but there was no sign of it doing so. I plan to redownload it once you implemented them into the Download Center, so I wanted to please you again about that. It would make it easier for these that use download managers, IF there are any. Pretty sure I'm not the only one!

gnosygnu commented 7 years ago

Yeah, read-only was just a option to make sure nothing is going to be modified, atleast in my opinion.

Yeah, that's an honest expectation. XOWA should have been able to handle read-only files.

I played around with it some more today, and it looks like I can force SQLite to open in read-only mode. (properties.setProperty("open_mode", "1");). I'll add this to the next release. Thanks for reporting!

I didn't enable Web-Access, that was the problem. Thanks!

Cool. Glad that worked. :)

Regarding IDM, I'm just using it for faster internet connection.

Yup. That's perfectly fine. Any dedicated internet download manager will probably be faster than XOWA. Among other things, one trick is they will open up two or more connections and download in parallel. XOWA only opens up one connection.

Your code for the Random Page worked. Thanks alot.

Cool. Thanks for confirming. :)

Could you also try to get a fix regarding the categories and random page on the 2016-08 wikis, such as the current en-de wiktionary, wikivoyage, wikiquote, wikisource, wikibooks, wikiversity and wikinews that I got using the Download Central(I just recognised that they're missing the random article/category as well.)? Because it is pretty stressful downloading all the wikis using IDM, you have to click on every single link possible, you know. That's how I did it and I don't want to do it again this month to be honest. I resolved to do it in 3 or 4 months. That would be very nice, I want to make it comfortable until I get there. :)

Actually, both of these will be hard, as the easiest way would be to regenerate the HTML dumps.

With that said, I think I can force a "dummy" Random Page in the left-hand nav so that at least something will show. I'll have this in the next XOWA release.

Categories is different. No matter what, it's going to involve downloading more data. The HTML dumps simply don't have that info.

I am planning to make it a separate download. Something like, download these files and put it in the wiki directory and you'll have categories. However, even for English Wikipedia, the Category databases are about 10 GB. Also, you'd still have to download these category files for each wiki. Sorry, but there's really no alternative to downloading more: the category data is just not there. I am going to try to minimize downloading (so you won't have to re-download the full set)

Finally, adding Categories will take time. I'll make it part of a near-future release, but it may be a month.

Edit: Regarding commons.wikipedia.org, could you in the future implement commons.wikipedia.org and wikidata into the Download Central? I'm downloading wikidata without IDM at the moment and it takes ages, lol. That'd be pleasurable for future downloads.

Actually, if you're using Download Central, you no longer need commons or wikidata.

Specifically, the _wikitext_ databases needed commons / wikidata b/c they needed to look up data in those wikis.

However, the _HTML_ databases already pre-compiled this info into the page. You can drop commons / wikidata and still work fine.

That said, if you want to look at a specific page in commons (https://commons.wikimedia.org/wiki/Earth) or wikidata (https://wwww.wikidata.org/wiki/Q2), then you should probably use the wikitext one. Precompiling "HTML" just doesn't work as well for these wikis. I don't have any plans to generate _HTML_ for them, due to some other complexities. But again, you really don't need them.

Edit 2: Apparently the download of wikidata just ended, then I opened a wiki (can't remember the name) and XOWA crashed.

If you have the log file, you can send it to me, and I'll see why it crashed. Generally it's at C:\xowa\user\anonymous\app\tmp\log

I opened it again and it came with a message saying that the download was not completed yet, it said I can delete it, or I can continue and XOWA may run into issues.

Honestly, you probably should have deleted and re-imported. I left the continue option there to be complete, but really, failed builds should be considered non-resumable.

I continued, then I opened an article to check whether it works or not. Wikidata works overall, but clicking on it at the Main Page doesn't do anything. I don't know if that's how it's supposed to be.

The Main Page should work. Chances are your import is broken.

If you can, do the following:

fil|www.wikidata.org-core.wbase.xowa|4793843712|20160830 060440.475
fil|www.wikidata.org-core.xowa|2087890944|20160830 060544.455
fil|www.wikidata.org-file-core.xowa|57344|20160830 060510.855
fil|www.wikidata.org-file-user.xowa|245760|20160902 201715.023
fil|www.wikidata.org-text-ns.000-db.002.xowa|2087829504|20160830 060604.155
fil|www.wikidata.org-text-ns.000-db.003.xowa|2047987712|20160830 060607.991
fil|www.wikidata.org-text-ns.000-db.004.xowa|2046984192|20160830 060612.315
fil|www.wikidata.org-text-ns.000-db.005.xowa|1995837440|20160830 060618.927
fil|www.wikidata.org-text-ns.000-db.006.xowa|1973600256|20160830 060622.763
fil|www.wikidata.org-text-ns.000-db.007.xowa|1968947200|20160830 060626.447
fil|www.wikidata.org-text-ns.000-db.008.xowa|1950511104|20160830 060630.663
fil|www.wikidata.org-text-ns.000-db.009.xowa|1974226944|20160830 060638.323
fil|www.wikidata.org-text-ns.000-db.010.xowa|1912451072|20160830 060643.055
fil|www.wikidata.org-text-ns.000-db.011.xowa|2015051776|20160830 060647.547
fil|www.wikidata.org-text-ns.000-db.012.xowa|1977757696|20160830 060652.487
fil|www.wikidata.org-text-ns.000-db.013.xowa|1970962432|20160830 060658.119
fil|www.wikidata.org-text-ns.000-db.014.xowa|1949216768|20160830 060705.691
fil|www.wikidata.org-text-ns.000-db.015.xowa|1966096384|20160830 060710.435
fil|www.wikidata.org-text-ns.000-db.016.xowa|1203007488|20160830 060715.263
fil|www.wikidata.org-text-ns.000.xowa|2012631040|20160830 060551.799
fil|www.wikidata.org-text-ns.004.xowa|78180352|20160830 060546.875
fil|www.wikidata.org-text-ns.008.xowa|819200|20160830 060549.079
fil|www.wikidata.org-text-ns.010.xowa|2936832|20160830 060556.311
fil|www.wikidata.org-text-ns.012.xowa|4014080|20160830 060554.099
fil|www.wikidata.org-text-ns.014.xowa|724992|20160830 060558.535
fil|www.wikidata.org-text-ns.1198.xowa|23121920|20160830 060600.899
fil|www.wikidata.org-text-ns.120.xowa|6352896|20160830 060614.895
fil|www.wikidata.org-text-ns.2600.xowa|266240|20160830 060701.307
fil|www.wikidata.org-text-ns.828.xowa|880640|20160830 060633.551
fil|www.wikidata.org-xtn.category.core.xowa|86016|20160830 060533.883
fil|www.wikidata.org-xtn.category.link-db.001.xowa|15618048|20160830 060516.563

I opened the task manager, I saw that XOWA was using alot of RAM probably because it was extracting it at the moment, but there was no sign of it doing so.

Hmm... that's strange. Again, the failed import may have left you in a broken state, but it's hard to say which. Let me know if you still see the high RAM usage

I plan to redownload it once you implemented them into the Download Center, so I wanted to please you again about that. It would make it easier for these that use download managers, IF there are any. Pretty sure I'm not the only one!

Yeah, let me know if you think you need them. Otherwise, like I said above, I'm planning to not publish them on Download Central since few people will need them (and those that do can still work off the _wikitext_ ones)

gnosygnu commented 7 years ago

Also, I renamed the title just to give more detail. Hope that's fine with you. Let me know if not, and I'll restore the original version (No more random articles? As well as slower loading pictures.)

Ope30 commented 7 years ago

Okay then. I will wait until you added categories. By the way, considering your picture that you posted, it seems you have the Random Page such as Interaction etc, as well as categories maybe? How so? You're literally using the same en-wiki as me, now what is the difference again? Just curious, excuse me if I missed something.

The title is absolutely fine and much more detailed than the previous one.

So, regarding categories, I don't care whether I have to download additional stuff for it to work, as long as it's not about 60GB, you know, it's fine. I really, really just want to get this to work.

All I need are the categories to show and the Navigation/Interaction, you know, and I'm done. Categories and the Random article function are just too important for me. Do whatever you think is the fastest way possible. I deeply want to get into XOWA again!

Edit: Could you tell me the folder size of your wikidata and commons wikipedia please? My commons.wikipedia size is 49,2 GB (52.901.051.751 Bytes), Wikidata is 47,9 GB (51.432.928.381 Bytes).

Edit 2: Opening Wikidata works, but not if you're in any other wikipedia, let's say you start XOWA and you're being directed to the Main Page of the en-Wikipedia, clicking on Wikidata doesn't let you open it, unless you're going to look for other languages, let's say you want to see an article in a other language, you click on Wikidata at the very bottom, from there on you're able to go onto the Main Page of Wikidata. commons.wikipedia.org works freely actually, but categories are not being displayed. Here is a picture to be more clear of what is going on: http://prnt.sc/cdoygv

Edit 3: Unfortunately I've found an article with missing images, here's the original link https://de.wikipedia.org/wiki/Herbert_Boeckl and here's a picture http://prnt.sc/cdstsc Please check it out! Also make sure to disable Web-Access, otherwise you receive missing images immediately.

Here is the log regarding the missing images:

[EDIT:gnosygnu: changed to gist; https://gist.github.com/gnosygnu/e27f4748487b5b2d0fffdc4bc3b68fbb]

Here is the log regarding wikidata:

fil|wikidatawiki-latest-pages-articles.xml.bz2|9939022331|20160902 153839.965 fil|www.wikidata.org-core.wbase.xowa|4793843712|20160902 200223.270 fil|www.wikidata.org-core.xowa|2443624448|20160902 200253.970 fil|www.wikidata.org-file-core.xowa|57344|20160902 200222.412 fil|www.wikidata.org-file-user.xowa|57344|20160902 200222.662 fil|www.wikidata.org-text-ns.000-db.002.xowa|2049634304|20160902 200222.563 fil|www.wikidata.org-text-ns.000-db.003.xowa|2000158720|20160902 200222.645 fil|www.wikidata.org-text-ns.000-db.004.xowa|1989349376|20160902 200223.270 fil|www.wikidata.org-text-ns.000-db.005.xowa|1929502720|20160902 200221.829 fil|www.wikidata.org-text-ns.000-db.006.xowa|1912541184|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.007.xowa|1915269120|20160902 200222.297 fil|www.wikidata.org-text-ns.000-db.008.xowa|1874522112|20160902 200223.206 fil|www.wikidata.org-text-ns.000-db.009.xowa|1870802944|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.010.xowa|1839366144|20160902 200222.412 fil|www.wikidata.org-text-ns.000-db.011.xowa|1943416832|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.012.xowa|1889558528|20160902 200222.622 fil|www.wikidata.org-text-ns.000-db.013.xowa|1853968384|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.014.xowa|1862631424|20160902 200222.661 fil|www.wikidata.org-text-ns.000-db.015.xowa|1894907904|20160902 200222.659 fil|www.wikidata.org-text-ns.000-db.016.xowa|1119154176|20160902 200222.662 fil|www.wikidata.org-text-ns.000.xowa|1988739072|20160902 200222.661 fil|www.wikidata.org-text-ns.004.xowa|76562432|20160902 200222.127 fil|www.wikidata.org-text-ns.008.xowa|626688|20160902 200222.563 fil|www.wikidata.org-text-ns.010.xowa|2551808|20160902 200222.197 fil|www.wikidata.org-text-ns.012.xowa|3936256|20160902 200223.574 fil|www.wikidata.org-text-ns.014.xowa|458752|20160902 200223.269 fil|www.wikidata.org-text-ns.1198.xowa|12034048|20160902 200222.657 fil|www.wikidata.org-text-ns.120.xowa|6234112|20160902 200222.294 fil|www.wikidata.org-text-ns.2600.xowa|135168|20160902 200222.655 fil|www.wikidata.org-text-ns.828.xowa|847872|20160902 200222.412 fil|www.wikidata.org-xtn.category.core.xowa|36864|20160902 200222.645 fil|www.wikidata.org-xtn.category.link-db.001.xowa|651264|20160902 200222.327 fil|www.wikidata.org-xtn.search.core.xowa|2879287296|20160902 191656.077 fil|www.wikidata.org-xtn.search.link-title-ns.000-db.001.xowa|449486848|20160902 200222.297 fil|www.wikidata.org-xtn.search.link-title-ns.999-db.001.xowa|10887168|20160902 200223.449 fil|xowa.wiki.pagelinks.sqlite3|878034944|20160902 191743.200

gnosygnu commented 7 years ago

Okay then. I will wait until you added categories.

Cool. That'll probably be about 2 weeks. Will definitely update in this thread.

By the way, considering your picture that you posted, it seems you have the Random Page such as Interaction etc, as well as categories maybe? How so? You're literally using the same en-wiki as me, now what is the difference again? Just curious, excuse me if I missed something.

It's a bit complicated, but here goes:

For the future, I'm going to include the "MediaWiki:" pages in the HTML dumps. I should have done it in these dumps, but forgot to.

For the next release, I'll put in a band-aid patch to hard-code this MediaWiki page if it's missing from both HTML and Wikitext databases.

The title is absolutely fine and much more detailed than the previous one.

Cool.

So, regarding categories, I don't care whether I have to download additional stuff for it to work, as long as it's not about 60GB, you know, it's fine. I really, really just want to get this to work.

Yeah, it won't be the full 60 GB. However, the categories for English Wikipedia will probably be about 10 GB. Still have to see.

All I need are the categories to show and the Navigation/Interaction, you know, and I'm done. Categories and the Random article function are just too important for me. Do whatever you think is the fastest way possible. I deeply want to get into XOWA again!

Yup. I definitely want to try to get these going too. :)

Edit: Could you tell me the folder size of your wikidata and commons wikipedia please? My commons.wikipedia size is 49,2 GB (52.901.051.751 Bytes), Wikidata is 47,9 GB (51.432.928.381 Bytes).

Sure. Here goes:

A little curious why your files are off. I'll post my files in the next comment. If you can, please post yours as well.

Edit 2: Opening Wikidata works, but not if you're in any other wikipedia, let's say you start XOWA and you're being directed to the Main Page of the en-Wikipedia, clicking on Wikidata doesn't let you open it, unless you're going to look for other languages, let's say you want to see an article in a other language, you click on Wikidata at the very bottom, from there on you're able to go onto the Main Page of Wikidata.

Hmm. Tried this now, and it does work. Here's what I did:

I don't know why your Wikidata databases are so large. It might be a broken build. If you can, see the next comment and post your files

commons.wikipedia.org works freely actually, but categories are not being displayed. Here is a picture to be more clear of what is going on: http://prnt.sc/cdoygv

Yeah, it looks like your missing about 7 GB. Probably category databases. Again, see the next comment, and try to post your files.

Edit 3: Unfortunately I've found an article with missing images, here's the original link https://de.wikipedia.org/wiki/Herbert_Boeckl and here's a picture http://prnt.sc/cdstsc Please check it out! Also make sure to disable Web-Access, otherwise you receive missing images immediately.

Thanks. FYI: For missing images, the url is fine for reporting (XOWA logs and screenshots are not necessary, so don't want to slow you down.)

Short story: This is actually a timing issue based on when I generate the dump and when the article was edited. It affects only a small number of articles (< 1%), and is usually fixed in the next dump

Longer story...

So basically, there will be missing images on any page...

This should be a "small" percentage of pages. Something like 1%. For English Wikipedia, usually 50,000 new images are introduced each month out of a population of 5,000,000 (hence 1%)

Going forward:

gnosygnu commented 7 years ago

Sorry, I just saw now that you posted your wikidata files in your last edit.

These look fine. The main differences are:

I still don't know why your wikidata acts differently. Can you look at my series of steps above, and try to reproduce?

As for commons, here is the link: www.wikidata.org/wiki/Special:XowaDiag?type=fs.check&wiki=commons.wikimedia.org

Note that all I did was change the &wiki=www.wikidata.org to &wiki=commons.wikimedia.org

Here are my files:

fil|commons.wikimedia.org-core.xowa|6251446272|20160807 142254.134
fil|commons.wikimedia.org-file-core.xowa|57344|20160807 131435.077
fil|commons.wikimedia.org-file-user.xowa|57344|20160807 131435.085
fil|commons.wikimedia.org-text-ns.000.xowa|108687360|20160807 152300.134
fil|commons.wikimedia.org-text-ns.002.xowa|24576|20160807 152527.874
fil|commons.wikimedia.org-text-ns.003.xowa|24576|20160807 152511.806
fil|commons.wikimedia.org-text-ns.004.xowa|575901696|20160807 152305.042
fil|commons.wikimedia.org-text-ns.006-db.002.xowa|2200449024|20160807 152354.126
fil|commons.wikimedia.org-text-ns.006-db.003.xowa|2211282944|20160807 152417.466
fil|commons.wikimedia.org-text-ns.006-db.004.xowa|2221879296|20160807 152429.470
fil|commons.wikimedia.org-text-ns.006-db.005.xowa|2208755712|20160807 152444.290
fil|commons.wikimedia.org-text-ns.006-db.006.xowa|2275147776|20160807 152507.002
fil|commons.wikimedia.org-text-ns.006-db.007.xowa|2201477120|20160807 152522.142
fil|commons.wikimedia.org-text-ns.006-db.008.xowa|2286469120|20160807 152536.590
fil|commons.wikimedia.org-text-ns.006-db.009.xowa|2232737792|20160807 152550.710
fil|commons.wikimedia.org-text-ns.006-db.010.xowa|975994880|20160807 152559.558
fil|commons.wikimedia.org-text-ns.006.xowa|2162298880|20160807 152319.630
fil|commons.wikimedia.org-text-ns.008.xowa|6033408|20160807 152339.358
fil|commons.wikimedia.org-text-ns.010.xowa|65015808|20160807 152309.542
fil|commons.wikimedia.org-text-ns.012.xowa|1081344|20160807 152335.218
fil|commons.wikimedia.org-text-ns.014.xowa|1229037568|20160807 152330.486
fil|commons.wikimedia.org-text-ns.100.xowa|12562432|20160807 152343.534
fil|commons.wikimedia.org-text-ns.102.xowa|10330112|20160807 152358.930
fil|commons.wikimedia.org-text-ns.104.xowa|53248|20160807 152403.262
fil|commons.wikimedia.org-text-ns.106.xowa|2162688|20160807 152407.426
fil|commons.wikimedia.org-text-ns.1198.xowa|20799488|20160807 152449.266
fil|commons.wikimedia.org-text-ns.2600.xowa|24576|20160807 152541.310
fil|commons.wikimedia.org-text-ns.460.xowa|192512|20160807 152453.438
fil|commons.wikimedia.org-text-ns.490.xowa|94208|20160807 152457.634
fil|commons.wikimedia.org-text-ns.828.xowa|1257472|20160807 152434.794
fil|commons.wikimedia.org-xtn.category.core.xowa|86044672|20160807 142251.046
fil|commons.wikimedia.org-xtn.category.link-db.001.xowa|1645268992|20160807 135434.714
fil|commons.wikimedia.org-xtn.category.link-db.002.xowa|1680191488|20160807 135612.862
fil|commons.wikimedia.org-xtn.category.link-db.003.xowa|1838493696|20160807 135742.250
fil|commons.wikimedia.org-xtn.category.link-db.004.xowa|1727082496|20160807 135905.838
fil|commons.wikimedia.org-xtn.category.link-db.005.xowa|1554178048|20160807 140037.866
fil|commons.wikimedia.org-xtn.category.link-db.006.xowa|1580834816|20160807 140158.702
fil|commons.wikimedia.org-xtn.category.link-db.007.xowa|1996079104|20160807 140349.682
fil|commons.wikimedia.org-xtn.category.link-db.008.xowa|1567752192|20160807 140514.258
fil|commons.wikimedia.org-xtn.category.link-db.009.xowa|1625567232|20160807 140644.802
fil|commons.wikimedia.org-xtn.category.link-db.010.xowa|1994899456|20160807 140852.898
fil|commons.wikimedia.org-xtn.category.link-db.011.xowa|2379833344|20160807 141052.646
fil|commons.wikimedia.org-xtn.category.link-db.012.xowa|1609637888|20160807 141222.774
fil|commons.wikimedia.org-xtn.category.link-db.013.xowa|1547407360|20160807 141352.602
fil|commons.wikimedia.org-xtn.category.link-db.014.xowa|1563504640|20160807 141513.646
fil|commons.wikimedia.org-xtn.category.link-db.015.xowa|2888564736|20160807 141810.018
fil|commons.wikimedia.org-xtn.category.link-db.016.xowa|1560645632|20160807 141929.146
fil|commons.wikimedia.org-xtn.category.link-db.017.xowa|2027888640|20160807 142142.618
fil|commons.wikimedia.org-xtn.category.link-db.018.xowa|1000984576|20160807 142241.958
Ope30 commented 7 years ago

Here is the log regarding commons.wikipedia:

fil|commons.wikimedia.org-core.xowa|6332846080|20160903 165145.100 fil|commons.wikimedia.org-file-core.xowa|57344|20160903 165116.537 fil|commons.wikimedia.org-file-user.xowa|290816|20160903 165307.886 fil|commons.wikimedia.org-text-ns.000.xowa|95166464|20160903 165112.989 fil|commons.wikimedia.org-text-ns.002.xowa|16384|20160903 165116.053 fil|commons.wikimedia.org-text-ns.003.xowa|16384|20160903 165116.045 fil|commons.wikimedia.org-text-ns.004.xowa|507658240|20160903 165113.030 fil|commons.wikimedia.org-text-ns.006-db.002.xowa|1729376256|20160903 165113.229 fil|commons.wikimedia.org-text-ns.006-db.003.xowa|1728864256|20160903 165114.313 fil|commons.wikimedia.org-text-ns.006-db.004.xowa|1734057984|20160903 165114.318 fil|commons.wikimedia.org-text-ns.006-db.005.xowa|1749979136|20160903 165114.873 fil|commons.wikimedia.org-text-ns.006-db.006.xowa|1834831872|20160903 165116.036 fil|commons.wikimedia.org-text-ns.006-db.007.xowa|1742389248|20160903 165116.047 fil|commons.wikimedia.org-text-ns.006-db.008.xowa|1992622080|20160903 165116.055 fil|commons.wikimedia.org-text-ns.006-db.009.xowa|1815855104|20160903 165116.057 fil|commons.wikimedia.org-text-ns.006-db.010.xowa|957190144|20160903 165116.065 fil|commons.wikimedia.org-text-ns.006.xowa|1726943232|20160903 165113.033 fil|commons.wikimedia.org-text-ns.008.xowa|4468736|20160903 165113.222 fil|commons.wikimedia.org-text-ns.010.xowa|52510720|20160903 165113.031 fil|commons.wikimedia.org-text-ns.012.xowa|1048576|20160903 165113.136 fil|commons.wikimedia.org-text-ns.014.xowa|800157696|20160903 165113.038 fil|commons.wikimedia.org-text-ns.100.xowa|10981376|20160903 165113.226 fil|commons.wikimedia.org-text-ns.102.xowa|10231808|20160903 165113.230 fil|commons.wikimedia.org-text-ns.104.xowa|45056|20160903 165113.287 fil|commons.wikimedia.org-text-ns.106.xowa|1712128|20160903 165114.308 fil|commons.wikimedia.org-text-ns.1198.xowa|10592256|20160903 165115.405 fil|commons.wikimedia.org-text-ns.2600.xowa|16384|20160903 165116.055 fil|commons.wikimedia.org-text-ns.460.xowa|167936|20160903 165115.959 fil|commons.wikimedia.org-text-ns.490.xowa|61440|20160903 165116.034 fil|commons.wikimedia.org-text-ns.828.xowa|1228800|20160903 165114.871 fil|commons.wikimedia.org-xtn.category.core.xowa|81481728|20160903 165116.067 fil|commons.wikimedia.org-xtn.category.link-db.001.xowa|1370124288|20160903 165116.247 fil|commons.wikimedia.org-xtn.category.link-db.002.xowa|1375547392|20160903 165116.305 fil|commons.wikimedia.org-xtn.category.link-db.003.xowa|1371914240|20160903 165116.360 fil|commons.wikimedia.org-xtn.category.link-db.004.xowa|1474797568|20160903 165116.421 fil|commons.wikimedia.org-xtn.category.link-db.005.xowa|328085504|20160903 165116.482 fil|commons.wikimedia.org-xtn.search.core.xowa|6760353792|20160903 165226.076 fil|commons.wikimedia.org-xtn.search.link-title-ns.000-db.001.xowa|16384|20160903 165116.522 fil|commons.wikimedia.org-xtn.search.link-title-ns.999-db.001.xowa|16384|20160903 165116.532 fil|commonswiki-latest-pages-articles.xml.bz2|5827259686|20160903 132922.904 And here is a new log regarding wikidata (I deleted the xml file, don't know if that changes anything):

fil|www.wikidata.org-core.wbase.xowa|4793843712|20160902 200223.270 fil|www.wikidata.org-core.xowa|2443624448|20160902 200253.970 fil|www.wikidata.org-file-core.xowa|57344|20160902 200222.412 fil|www.wikidata.org-file-user.xowa|57344|20160902 200222.662 fil|www.wikidata.org-text-ns.000-db.002.xowa|2049634304|20160902 200222.563 fil|www.wikidata.org-text-ns.000-db.003.xowa|2000158720|20160902 200222.645 fil|www.wikidata.org-text-ns.000-db.004.xowa|1989349376|20160902 200223.270 fil|www.wikidata.org-text-ns.000-db.005.xowa|1929502720|20160902 200221.829 fil|www.wikidata.org-text-ns.000-db.006.xowa|1912541184|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.007.xowa|1915269120|20160902 200222.297 fil|www.wikidata.org-text-ns.000-db.008.xowa|1874522112|20160902 200223.206 fil|www.wikidata.org-text-ns.000-db.009.xowa|1870802944|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.010.xowa|1839366144|20160902 200222.412 fil|www.wikidata.org-text-ns.000-db.011.xowa|1943416832|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.012.xowa|1889558528|20160902 200222.622 fil|www.wikidata.org-text-ns.000-db.013.xowa|1853968384|20160902 200222.417 fil|www.wikidata.org-text-ns.000-db.014.xowa|1862631424|20160902 200222.661 fil|www.wikidata.org-text-ns.000-db.015.xowa|1894907904|20160902 200222.659 fil|www.wikidata.org-text-ns.000-db.016.xowa|1119154176|20160902 200222.662 fil|www.wikidata.org-text-ns.000.xowa|1988739072|20160902 200222.661 fil|www.wikidata.org-text-ns.004.xowa|76562432|20160902 200222.127 fil|www.wikidata.org-text-ns.008.xowa|626688|20160902 200222.563 fil|www.wikidata.org-text-ns.010.xowa|2551808|20160902 200222.197 fil|www.wikidata.org-text-ns.012.xowa|3936256|20160902 200223.574 fil|www.wikidata.org-text-ns.014.xowa|458752|20160902 200223.269 fil|www.wikidata.org-text-ns.1198.xowa|12034048|20160902 200222.657 fil|www.wikidata.org-text-ns.120.xowa|6234112|20160902 200222.294 fil|www.wikidata.org-text-ns.2600.xowa|135168|20160902 200222.655 fil|www.wikidata.org-text-ns.828.xowa|847872|20160902 200222.412 fil|www.wikidata.org-xtn.category.core.xowa|36864|20160902 200222.645 fil|www.wikidata.org-xtn.category.link-db.001.xowa|651264|20160902 200222.327 fil|www.wikidata.org-xtn.search.core.xowa|2879287296|20160902 191656.077 fil|www.wikidata.org-xtn.search.link-title-ns.000-db.001.xowa|449486848|20160902 200222.297 fil|www.wikidata.org-xtn.search.link-title-ns.999-db.001.xowa|10887168|20160902 200223.449 fil|xowa.wiki.pagelinks.sqlite3|878034944|20160902 191743.200

Overall, I've come to a conclusion to redownload EVERYTHING. I will probably download the 2016-09 wikis once you're completely done. Like always, take your time and no rush. I will wait. I'm patient.

Here are some questions/wishes if possible: Could you tell me how you downloaded your wikis? Especially commons and wikidata. Did you use the Download Central? Is there any other place to get them from? You know, faster internet connection. Regarding missing images, could you try to get an unaffected dump with no missing images? If there is even a way. That would be a treasure, knowing you have every image possible and you no longer have to enable Web-Access again (if you know what I mean). Sounds weird, I know. Could you also fix categories, Navigation and Interaction such as for the 2016-09 wikis (wikiquote, books, voyage source) and so on? I know that takes very long time. But like I said, I will wait! I know that it is definitely worth the wait. I'll download/perform everything at once. If I have to download additional stuff, I'll do it if it helps me in any way.

I know these requests are hard to come through, but you know, I really want to get into it again! I just want to make it pleasantly, you know.

gnosygnu commented 7 years ago

Here is the log regarding commons.wikipedia:

Thanks. I checked them, and yours is definitely missing data. There should be 18 category databases, and you have only 5.

Out of curiosity, do you find yourself looking at pages in commons.wikimedia.org? Or do you have them because XOWA used to require commons.wikimedia.org when clicking on images?

And here is a new log regarding wikidata (I deleted the xml file, don't know if that changes anything):

Nope, XML doesn't matter. Your files should be good. Have you tried the checklist I posted above? To save you the upwards scroll, here goes:

Overall, I've come to a conclusion to redownload EVERYTHING. I will probably download the 2016-09 wikis once you're completely done. Like always, take your time and no rush. I will wait. I'm patient.

Cool. Unfortunately, I think redownloading is going to be the best approach. I spent some time today working on the sidebar and the Random link, and there's no real way to fake it with a program update. It's going to require a separate download by wiki.

I really don't think that the current 2016-08 wikis are missing much. Specifically, they don't have the following

I'll try to get these fixed for the 2016-09 build, but let me know if anything is missing.

Here are some questions/wishes if possible: Could you tell me how you downloaded your wikis? Especially commons and wikidata. Did you use the Download Central? Is there any other place to get them from? You know, faster internet connection.

I use a command-line script detailed here: http://xowa.org/home/wiki/Dev/Command-line/Thumbs . However, this is basically the same as using Tools -> Import Online or Tools -> Import Offline. I think you've been cursed by a bad download / interrupted build. Try Import Online again, and if it fails, post the log here: home/wiki/Special:XowaSystemData?type=log_session

I don't upload either commons or wikidata to Download Central because there's less need for them when using the wiki. If you do use them, I can upload them to archive.org, but feel like it won't be a good use of your download bandwidth.

Regarding missing images, could you try to get an unaffected dump with no missing images? If there is even a way. That would be a treasure, knowing you have every image possible and you no longer have to enable Web-Access again (if you know what I mean). Sounds weird, I know.

No problem. Like I said, this has been on my list for a while. I'll add some code to get it done for 2016-09 English Wikipedia.

Could you also fix categories, Navigation and Interaction such as for the 2016-09 wikis (wikiquote, books, voyage source) and so on? I know that takes very long time. But like I said, I will wait! I know that it is definitely worth the wait. I'll download/perform everything at once. If I have to download additional stuff, I'll do it if it helps me in any way.

Navigation and Interaction should definitely be fixed for 2016-09. Categories is harder, but I'll start taking a look at it tomorrow.

I know these requests are hard to come through, but you know, I really want to get into it again! I just want to make it pleasantly, you know.

Sure. Let's see how it goes in a week's time. ;)

Ope30 commented 7 years ago

Thanks. I checked them, and yours is definitely missing data. There should be 18 category databases, and you have only 5.

Yeah, I'm redownloading it right now just for testing and if it doesn't work, I'll just do it your way. Maybe I need just a new dump. I don't know.

Out of curiosity, do you find yourself looking at pages in commons.wikimedia.org? Or do you have them because XOWA used to require commons.wikimedia.org when clicking on images?

Both actually.

Nope, XML doesn't matter. Your files should be good. Have you tried the checklist I posted above? To save you the upwards scroll, here goes:

Navigated to en.wikipedia.org/wiki/Main Page
Clicked on www.wikidata.org in the left-hand sidebar -> Went to www.wikidata.org/wiki/Wikidata:Main Page
Went back to en.wikipedia.org/wiki/Main Page
Clicked on the Wikidata link at the bottom of the page (under Sister Projects) -> Went to www.wikidata.org/wiki/Wikidata:Main Page

Well, that doesn't work for me. When I click on it, it just doesn't do anything, you know. But it's okay, I'll redownload everything anyways!

I use a command-line script detailed here: http://xowa.org/home/wiki/Dev/Command-line/Thumbs . However, this is basically the same as using Tools -> Import Online or Tools -> Import Offline. I think you've been cursed by a bad download / interrupted build. Try Import Online again, and if it fails, post the log here: home/wiki/Special:XowaSystemData?type=log_session

I will probably do this if the problem furthermore occurs.

I don't upload either commons or wikidata to Download Central because there's less need for them when using the wiki. If you do use them, I can upload them to archive.org, but feel like it won't be a good use of your download bandwidth.

Sure. I will try it out incase something goes wrong.

No problem. Like I said, this has been on my list for a while. I'll add some code to get it done for 2016-09 English Wikipedia.

Could you do this for the German ones as well? Wikivoyage, books etc are having the same problem. That would be courteous!

Navigation and Interaction should definitely be fixed for 2016-09. Categories is harder, but I'll start taking a look at it tomorrow.

I really hope you will fix this. Categories are one of the best things imo!

Edit: Here is the log regarding commons.wikipedia, apparently the download just ended:

[EDIT:gnosygnu: changed to gist: https://gist.github.com/gnosygnu/08c3fa91df267e5b7abc5f0cf7d46545]

Surprisingly, the size is about 70,2 GB (75.422.170.471 Bytes) now, larger than before. Seems too big for me. I closed XOWA now, then I opened commons.wikimedia and it still has no categories, lol.

Here is a new log regarding commons.wikimedia after I closed it:

il|commons.wikimedia.org-core.xowa|6966075392|20160905 153251.198 fil|commons.wikimedia.org-file-core.xowa|57344|20160905 153228.651 fil|commons.wikimedia.org-file-user.xowa|172032|20160905 153322.421 fil|commons.wikimedia.org-text-ns.000.xowa|95166464|20160905 153225.494 fil|commons.wikimedia.org-text-ns.002.xowa|16384|20160905 153227.526 fil|commons.wikimedia.org-text-ns.003.xowa|16384|20160905 153227.526 fil|commons.wikimedia.org-text-ns.004.xowa|507658240|20160905 153225.494 fil|commons.wikimedia.org-text-ns.006-db.002.xowa|1729376256|20160905 153225.651 fil|commons.wikimedia.org-text-ns.006-db.003.xowa|1728864256|20160905 153225.979 fil|commons.wikimedia.org-text-ns.006-db.004.xowa|1734057984|20160905 153225.979 fil|commons.wikimedia.org-text-ns.006-db.005.xowa|1749979136|20160905 153226.182 fil|commons.wikimedia.org-text-ns.006-db.006.xowa|1834831872|20160905 153226.963 fil|commons.wikimedia.org-text-ns.006-db.007.xowa|1742389248|20160905 153227.526 fil|commons.wikimedia.org-text-ns.006-db.008.xowa|1992622080|20160905 153227.541 fil|commons.wikimedia.org-text-ns.006-db.009.xowa|1815855104|20160905 153227.541 fil|commons.wikimedia.org-text-ns.006-db.010.xowa|957190144|20160905 153227.557 fil|commons.wikimedia.org-text-ns.006.xowa|1726943232|20160905 153225.510 fil|commons.wikimedia.org-text-ns.008.xowa|4468736|20160905 153225.651 fil|commons.wikimedia.org-text-ns.010.xowa|52510720|20160905 153225.510 fil|commons.wikimedia.org-text-ns.012.xowa|1048576|20160905 153225.541 fil|commons.wikimedia.org-text-ns.014.xowa|800157696|20160905 153225.510 fil|commons.wikimedia.org-text-ns.100.xowa|10981376|20160905 153225.651 fil|commons.wikimedia.org-text-ns.102.xowa|10231808|20160905 153225.666 fil|commons.wikimedia.org-text-ns.104.xowa|45056|20160905 153225.791 fil|commons.wikimedia.org-text-ns.106.xowa|1712128|20160905 153225.979 fil|commons.wikimedia.org-text-ns.1198.xowa|10592256|20160905 153226.198 fil|commons.wikimedia.org-text-ns.2600.xowa|16384|20160905 153227.541 fil|commons.wikimedia.org-text-ns.460.xowa|167936|20160905 153226.448 fil|commons.wikimedia.org-text-ns.490.xowa|61440|20160905 153226.963 fil|commons.wikimedia.org-text-ns.828.xowa|1228800|20160905 153226.182 fil|commons.wikimedia.org-xtn.category.core.xowa|81481728|20160905 153227.557 fil|commons.wikimedia.org-xtn.category.link-db.001.xowa|1370124288|20160905 153228.120 fil|commons.wikimedia.org-xtn.category.link-db.002.xowa|1375547392|20160905 153228.135 fil|commons.wikimedia.org-xtn.category.link-db.003.xowa|1371914240|20160905 153228.151 fil|commons.wikimedia.org-xtn.category.link-db.004.xowa|1474797568|20160905 153228.151 fil|commons.wikimedia.org-xtn.category.link-db.005.xowa|328085504|20160905 153228.495 fil|commons.wikimedia.org-xtn.search.core.xowa|22678024192|20160905 152025.246 fil|commons.wikimedia.org-xtn.search.link-title-ns.000-db.001.xowa|8409088|20160905 153228.620 fil|commons.wikimedia.org-xtn.search.link-title-ns.999-db.001.xowa|4483862528|20160905 153228.620 fil|commonswiki-latest-pages-articles.xml.bz2|5827259686|20160905 122403.824 fil|xowa.wiki.pagelinks.sqlite3|1477914624|20160905 152232.278

Update: It just went down from 70 GB to 45,5 GB (48.954.887.402 Bytes) when I created the search index. I seriously cannot wait to redownload everything.

Update 2: It continues to grow. It gets bigger, bigger and bigger.

Okay, I'm going to delete it now. I'll use your method instead to download it. I really need a break right now, I think all I can do now is wait.

By the way, here is a random script error that I've found:

 Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
Script error: @D:/XOWA 08/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:59 vm error: gplx.Err: type mismatch: expdType=boolean actlType=java.lang.String actlObj=y 
→ 
gnosygnu commented 7 years ago

Hey, a few things.

First, sorry for missing your updates on commons and wikidata. Github only sends an email for the comment, but doesn't send emails for updates. I didn't know that you added more logs

Second, tonight's release (https://github.com/gnosygnu/xowa/releases/tag/v3.9.2.1) is mainly centered around this ticket. In particular, it includes support for categories in the HTML dump.

Unfortunately, you won't actually be able to use it until I release new dumps. I'm going to release new dumps for English, German, and French this week, so hopefully another week's wait won't be bad.

Finally, as for the commons and wikidata issues:

Let me know how you want to proceed. And thanks again for all the help with reporting.

Ope30 commented 7 years ago

First, sorry for missing your updates on commons and wikidata. Github only sends an email for the comment, but doesn't send emails for updates. I didn't know that you added more logs

No worries!

Second, tonight's release (https://github.com/gnosygnu/xowa/releases/tag/v3.9.2.1) is mainly centered around this ticket. In particular, it includes support for categories in the HTML dump.

Unfortunately, you won't actually be able to use it until I release new dumps. I'm going to release new dumps for English, German, and French this week, so hopefully another week's wait won't be bad.

This is very nice to hear! I'm very thankful for you fixing all these issues, it really shows all the effort you put into XOWA.

I'll download the dumps.

I snipped the logs above and created links to gist.github.com. It made scrolling painful. :)

Yeah, you're right. Sorry for all the logs. The scrolling was pretty annoying.

The search database and category for commons is pretty big. It clocks in around 80 GB. Again, this is due to the commons wiki being so large and the image titles being many words.

Might be. I may download it in the future.

Since you're using the HTML dumps, I really don't think you need commons or wikidata. You only need them if you're using Wikitext database, and even then, you only need wikidata. You're best off skipping them, unless you want to view specific pages in commons.wikimedia.org or www.wikidata.org.

I'll only get Wikidata from now on.

If you'd like them anyway, I can upload my copy to archive.org so you can just download them. It's probably easier than trying to troubleshoot the issue.

That would be cool. I'll probably download it in the future! Could you make a screenshot of you being in commons.wikipedia? You know, any random article.

Just out of curiosity, have you found a way in any form yet regarding the missing images?

If you want to troubleshoot the issue, we can go ahead. I think part of the problem is that you're on 32-bit Windows. Commons (and wikidata) are both large wikis and it may exceed the 4 GB limit. I have 64-bit Windows and Linux machines, so I don't run into those issues.

Actually, I'm on a 64-bit system. However, I'll definitely troubleshoot if I run into any issue! Thanks.

gnosygnu commented 7 years ago

Hey. Agree with all your comments above. Some responses below.

Could you make a screenshot of you being in commons.wikipedia? You know, any random article.

Sure. Here goes two for commons.wikimedia.org/wiki/Earth

commons_1 commons_2

Just out of curiosity, have you found a way in any form yet regarding the missing images?

I'm going to work on it this week. But just to explain, it's only really needed when I use a commons dump that doesn't match the wiki dump. Last month, I did 2016-08-02 commons and 2016-08-22 dewiki. This month, they'll both fall on the same date: 2016-09-01. There are only a few mismatches when they are the same

Actually, I'm on a 64-bit system. However, I'll definitely troubleshoot if I run into any issue!

Cool. But just so you know, you're running the 32-bit XOWA (xowa_windows.jar): https://gist.github.com/gnosygnu/08c3fa91df267e5b7abc5f0cf7d46545#file-gistfile1-txt-L7 . For the purposes of the program, XOWA only sees 32-bit Java, not your 64-bit system.

Again, it's just a guess, but if you want to give it a try with xowa_64.exe (instead of xowa.exe), it may work.

Thanks!

Ope30 commented 7 years ago

Sure. Here goes two for commons.wikimedia.org/wiki/Earth

Interesting. I think I changed my mind. I will download it subsequently. Can you upload your copy of commons.wikipedia and wikidata onto archive.org? I just want to be conclusive this time!

I'm going to work on it this week. But just to explain, it's only really needed when I use a commons dump that doesn't match the wiki dump. Last month, I did 2016-08-02 commons and 2016-08-22 dewiki. This month, they'll both fall on the same date: 2016-09-01. There are only a few mismatches when they are the same

Cool! Uhm, to be clear, what exactly do you mean with there are only a few mismatches when they are the same? Do you mean missing images still preexist?

Cool. But just so you know, you're running the 32-bit XOWA (xowa_windows.jar): https://gist.github.com/gnosygnu/08c3fa91df267e5b7abc5f0cf7d46545#file-gistfile1-txt-L7 . For the purposes of the program, XOWA only sees 32-bit Java, not your 64-bit system.

Again, it's just a guess, but if you want to give it a try with xowa_64.exe (instead of xowa.exe), it may work.

Right. I downloaded the 64 bit version of Java. Didn't know that matters at all. I'm using the 64 bit version from now on! Thanks.

gnosygnu commented 7 years ago

Sure. Here goes two for commons.wikimedia.org/wiki/Earth

Interesting. I think I changed my mind. I will download it subsequently. Can you upload your copy of commons.wikipedia and wikidata onto archive.org? I just want to be conclusive this time!

Oops. I forgot to say that the screenshots were with online access: i.e.: the images were downloaded dynamically from the internet. The XOWA commons.wikimedia.org wiki has no images. You'd have to be connected to the internet to download them.

Don't know if you still want them. They really are less useful than they appear....

I'm going to work on it this week. But just to explain, it's only really needed when I use a commons dump that doesn't match the wiki dump. Last month, I did 2016-08-02 commons and 2016-08-22 dewiki. This month, they'll both fall on the same date: 2016-09-01. There are only a few mismatches when they are the same

Cool! Uhm, to be clear, what exactly do you mean with there are only a few mismatches when they are the same? Do you mean missing images still preexist?

The problem is timing:

In your example above:

For this month, I'm going to keep the dates the same. Specifically, I'll use a 2016-09-01 commons.wikimedia.org and a 2016-09-01 de.wikipedia.org

It's still possible for 2016-09-01 de.wikipedia.org to have new images that 2016-09-01 commons.wikimedia.org won't have. That's because there's actually a few hours difference between the two dumps. But the missing images here would still be less than the 22 day difference between 2016-08-01 and 2016-08-22

Hope that makes sense.

Cool. But just so you know, you're running the 32-bit XOWA (xowa_windows.jar): https://gist.github.com/gnosygnu/08c3fa91df267e5b7abc5f0cf7d46545#file-gistfile1-txt-L7 . For the purposes of the program, XOWA only sees 32-bit Java, not your 64-bit system. Again, it's just a guess, but if you want to give it a try with xowa_64.exe (instead of xowa.exe), it may work.

Right. I downloaded the 64 bit version of Java. Didn't know that matters at all. I'm using the 64 bit version from now on! Thanks.

Yeah, I'm reaching, but I can't think of any other reason why the build would have failed.

If you try again and it fails, send me the log again. I'd recommend running it overnight just so that you don't have to watch it.

Thanks.

Ope30 commented 7 years ago

Oops. I forgot to say that the screenshots were with online access: i.e.: the images were downloaded dynamically from the internet. The XOWA commons.wikimedia.org wiki has no images. You'd have to be connected to the internet to download them.

Don't know if you still want them. They really are less useful than they appear....

Damn. I probably won't download.

The problem is timing:

I had a 2016-08-01 commons.wikimedia.org wiki that said "these are all the known images as of 2016-08-01" I then used a 2016-08-22 de.wikipedia.org wiki that said "these are all the pages as of 2016-08-22" The problem is that the 2016-08-22 dewiki had pages which included new images added to commons.wikimedia.org after 2016-08-01.

In your example above:

The Herbert Boeckl article was modified on 2016-08-20 to show two new images These two images were just added to commons on 2016-08-15 However, XOWA only knew about images added up until 2016-08-01

For this month, I'm going to keep the dates the same. Specifically, I'll use a 2016-09-01 commons.wikimedia.org and a 2016-09-01 de.wikipedia.org

It's still possible for 2016-09-01 de.wikipedia.org to have new images that 2016-09-01 commons.wikimedia.org won't have. That's because there's actually a few hours difference between the two dumps. But the missing images here would still be less than the 22 day difference between 2016-08-01 and 2016-08-22

Hope that makes sense.

Yeah. Makes sense. Understood!

If you try again and it fails, send me the log again. I'd recommend running it overnight just so that you don't have to watch it.

I will!

Edit: I downloaded Wikidata, and it seems switching to 64 bit worked for me. But, categories, I can't see them. But, that doesn't matter for me. Fact is, it worked. It only matters for me for the main ones, you know.

gnosygnu commented 7 years ago

Damn. I probably won't download.

Yeah, sorry. For now, commons.wikimedia.org is really only for curator interest. For most users (including myself), it's not that useful.

Yeah. Makes sense. Understood!

Cool.

Edit: I downloaded Wikidata, and it seems switching to 64 bit worked for me. Great! Thanks for giving it a try

But, categories, I can't see them. But, that doesn't matter for me. Fact is, it worked. It only matters for me for the main ones, you know.

Yeah, I broke categories for wikidata only in the past release. I'll have that fixed for the next one.

Thanks!

Ope30 commented 7 years ago

Hey there!

This may be too demanded, however, would it be possible for you to upload the latest en and de wikis between 20-09 and 27-09? The closest one, for example the 2016-20-09 wikis, would be cool. I don't really see the point getting the 2016-09-01 wiki instead of the current ones, you know, that's just my view. I love being up-to-date. :P Regarding missing images, I know you would have to pick two equal dates. Whatever it takes, I'll wait. If you want I'll wait another 2 weeks, IF needed.

gnosygnu commented 7 years ago

Hey!

would it be possible for you to upload the latest en and de wikis between 20-09 and 27-09?

Out of curiosity, any reason you want 09-20? I like to pick the wikis from the beginning of the month, so I'll be doing 10-01 in two weeks. I'm guessing because 10-01 would mean 4 weeks for you, and you'd want it sooner?

If I do 09-20, then it would probably push out English Wikipedia at least another week, and maybe two. Not sure if I want to do that, since I should be finished with 09-01 in the next day or two....

Ope30 commented 7 years ago

Out of curiosity, any reason you want 09-20? I like to pick the wikis from the beginning of the month, so I'll be doing 10-01 in two weeks. I'm guessing because 10-01 would mean 4 weeks for you, and you'd want it sooner?

The reason is because they have a few more articles. I've found some interesting articles that were added just yesterday, you know. But I agree with you. You should probably just go with the 2016-09-01 ones. :d

If I do 09-20, then it would probably push out English Wikipedia at least another week, and maybe two. Not sure if I want to do that, since I should be finished with 09-01 in the next day or two....

Agreed. Let us just go with the 09-01 ones.

gnosygnu commented 7 years ago

The reason is because they have a few more articles. I've found some interesting articles that were added just yesterday, you know. But I agree with you. You should probably just go with the 2016-09-01 ones. :d

Cool. It actually turns out that 2016-09-20 is a bit hard to do. I need "commonswiki-20160920-image.sql.gz" and wikimedia doesn't dump this mid-month. So 2016-10-01 is going to be the next one.

Agreed. Let us just go with the 09-01 ones.

Great. Thanks!

gnosygnu commented 7 years ago

The 2016-09-01 English and German wikis were uploaded tonight. Also tonight's XOWA release has a few more fixes for the Category system as well as some miscellaneous ones for the MediaWiki namespace.

I think all items in this issue have been covered, with the exception of handling mismatched date wikis (2016-08-01 commons vs 2016-08-22 dewiki). I'll add this at a future date, but for now, I'll make sure that I always keep the dates in sync when producing wikis.

Let me know if there's anything else. Otherwise I'll mark this ticket closed in a few days. Thanks!

gnosygnu commented 7 years ago

@Ope30 . Hey, saw there was a deleted comment. Let me know if there are any issues.

FYI: one user reported a bad link on http://xowa.org/home/wiki/Blog.html . I updated the blog now with more detailed download instructions.

Thanks.

Ope30 commented 7 years ago

The 2016-09-01 English and German wikis were uploaded tonight. Also tonight's XOWA release has a few more fixes for the Category system as well as some miscellaneous ones for the MediaWiki namespace.

This is very nice to hear! Thank you so much. I'm downloading it right now. Fortunately, I've found a way to avoid all the single clicking on links. I've had to put all the links into a text file, then I had to import it using IDM. Just incase anyone is doing it the same way I do, this really helps.

I think all items in this issue have been covered, with the exception of handling mismatched date wikis (2016-08-01 commons vs 2016-08-22 dewiki). I'll add this at a future date, but for now, I'll make sure that I always keep the dates in sync when producing wikis.

Yeah. Cool. Are the wikis from the same date (commons.wikipedia)? If that's the case, that would be great. Just to be sure.

Let me know if there's anything else. Otherwise I'll mark this ticket closed in a few days. Thanks!

No doubt, I will!

@Ope30 . Hey, saw there was a deleted comment. Let me know if there are any issues. FYI: one user reported a bad link on http://xowa.org/home/wiki/Blog.html . I updated the blog now with more detailed download instructions.

Oh, I just saw your comment. Regarding the deleted one, I was testing something. Damn, what kind of bad link do you mean?

Ope30 commented 7 years ago

First of all, you did great work. Read-only works, categories averagely work, the Navigation bar works. More to come! I haven't gotten much time yet.

I've went through some stuff. Unfortunately, I've found several issues.

Categories do work, but most articles are missing them. Leonardo DiCaprio has categories, but I don't see any in XOWA. Literally 90% of all articles are missing categories. :(

Categories that consist of more than 1000 pages (or less, not sure), were bugged for me. I can't really remember which ones, but I remember I couldn't switch from page to page (next (200) I think). It is pretty hard finding articles that have categories. I always have to go back to the original wikis. Is there a way to hide "Hidden Categories"?

Those two issues were the only ones I found so far. If I find more I'll inform you as soon as I can. Neverthless, I hope I don't have to redownload in order to get the issues I mentioned above, working.

Do you plan adding the Navigation bar/fixing categories for both en-de wikiversity, source, books, news, etc in an future release? Feel like they really lack that. :P

gnosygnu commented 7 years ago

First of all, you did great work. Read-only works, categories averagely work, the Navigation bar works. More to come! I haven't gotten much time yet.

Cool. Thanks for testing, as well as all the encouragement!

I've went through some stuff. Unfortunately, I've found several issues. Categories do work, but most articles are missing them. Leonardo DiCaprio has categories, but I don't see any in XOWA. Literally 90% of all articles are missing categories. :(

Ugh. This was a careless bug on my side. Categories would only show for pages that had one word (Earth). It didn't work for pages with two or more words (Leonardo DiCaprio)

I made a release tonight to fix this. Please give it a try: https://github.com/gnosygnu/xowa/releases/tag/v3.9.4.2

Categories that consist of more than 1000 pages (or less, not sure), were bugged for me. I can't really remember which ones, but I remember I couldn't switch from page to page (next (200) I think). It is pretty hard finding articles that have categories. I always have to go back to the original wikis.

That's strange. I looked at it now, and was able to navigate forward and backward using next 200 / previous 200. For example, I've been using en.wikipedia.org/wiki/Category:2001_albums and related (2002_albums, 2003_albums)

I did find that the performance was very slow on an IDE drive (I'm using an SSD drive). I'll have that fixed in the next dump (2016-10). I did make a temporary fix for the 2016-09 dump, but you'll have to download 2 more files. 1 is 180 MB and the other is 45 MB.

f you want to try the fix, please do the following:

I also created a patched version for German Wikipedia: https://drive.google.com/open?id=0B9cb52zjL2rIOGc1S0dhcFZpelk

Is there a way to hide "Hidden Categories"?

Ugh, this was another bug. There was an option to toggle it, but it wasn't working.

I fixed this in the release now. Try the following:

Those two issues were the only ones I found so far. If I find more I'll inform you as soon as I can. Neverthless, I hope I don't have to redownload in order to get the issues I mentioned above, working.

Well hopefully the above addressed it.

Do you plan adding the Navigation bar/fixing categories for both en-de wikiversity, source, books, news, etc in an future release? Feel like they really lack that. :P

I'm planning to do a full German, English and French refresh in October. I'm going to try to make sure the minor wikis (Wiktionary, etc) get quarterly releases but a lot depends on my schedule.

Thanks again for the testing! I really appreciate everything you found!

Ope30 commented 7 years ago

I can confirm that categories do work now, as well as hidden categories. Thanks so much!

That's strange. I looked at it now, and was able to navigate forward and backward using next 200 / previous 200. For example, I've been using en.wikipedia.org/wiki/Category:2001_albums and related (2002_albums, 2003_albums)

I believe I was wrong. It didn't fork for me the first time. Now it actually works.

I did find that the performance was very slow on an IDE drive (I'm using an SSD drive). I'll have that fixed in the next dump (2016-10). I did make a temporary fix for the 2016-09 dump, but you'll have to download 2 more files. 1 is 180 MB and the other is 45 MB.

f you want to try the fix, please do the following:

Download the English Wikipedia Category db: https://drive.google.com/open?id=0B9cb52zjL2rIQUxtU25sZW0ySFE
Unzip it and replace C:\xowa\wiki\en.wikipedia.org\en.wikipedia.org-xtn.category.core.xowa
Run XOWA and look up Category:2001_albums (or any other large category page)

I also created a patched version for German Wikipedia: https://drive.google.com/open?id=0B9cb52zjL2rIOGc1S0dhcFZpelk

Thanks for the files. I tested them, and to be honest I didn't see any difference. Categories still take a little while to show up. Even the little ones. By the way, they have the same file size as the ones I replaced. Just wondering. :p

Well hopefully the above addressed it.

It did. :D

I'm planning to do a full German, English and French refresh in October. I'm going to try to make sure the minor wikis (Wiktionary, etc) get quarterly releases but a lot depends on my schedule.

That's cool! Take your time.

Thanks again for the testing! I really appreciate everything you found!

No problem!

Sorry for asking this again, but did you take the dates the same?

If you want you can close this issue. Incase I find anything I will just make a new issue, if needed!

gnosygnu commented 7 years ago

I can confirm that categories do work now, as well as hidden categories. Thanks so much!

Cool. Good to hear.

I believe I was wrong. It didn't fork for me the first time. Now it actually works.

Also good to hear. ;)

Thanks for the files. I tested them, and to be honest I didn't see any difference. Categories still take a little while to show up. Even the little ones.

That's strange. Do you have an SSD? If not, how long does it take to load en.wikipedia.org/wiki/Category:2001_albums? For me, it was the difference between 25 seconds and 3 seconds.

By the way, they have the same file size as the ones I replaced. Just wondering. :p

Yup. I added an index, but surprisingly SQLite didn't change the file size.

I'm planning to do a full German, English and French refresh in October. I'm going to try to make sure the minor wikis (Wiktionary, etc) get quarterly releases but a lot depends on my schedule.

That's cool! Take your time.

Will start on them once the Wikimedia dumps for 2016-10 are available.

Sorry for asking this again, but did you take the dates the same?

Yup. Both dewiki, enwiki and commonswiki are all 2016-09-01.

If you want you can close this issue. Incase I find anything I will just make a new issue, if needed!

Will close in the next few days. Thanks!

Ope30 commented 7 years ago

That's strange. Do you have an SSD? If not, how long does it take to load en.wikipedia.org/wiki/Category:2001_albums? For me, it was the difference between 25 seconds and 3 seconds.

I agree. It took me about 30 seconds. I have a SSD, but it's reserved for windows, you know. My files, also the wikis, are on read-only, but I doubt that still changes anything. As I remember I disabled read-only to test whether it is still slow, it was still slow. I couldn't load a category, because that consisted of more than 500000 pages, I think. I could've loaded it if I waited a little longer maybe. If you find a fix for that, that would be genial. It's actually quite annoying. I'm mostly into categories. d: Other than that, there's no issue I've found so far. Went through some articles and no one was missing an image. Taking the dates the same was probably the best choice. ^^

Can you help me with something? I'm using a 4k monitor, therefore I have a small interface http://prnt.sc/cnd85h and I don't know how to fix this. Other programms are affected by this as well. Everything is so small, you know.

gnosygnu commented 7 years ago

I agree. It took me about 30 seconds.

Yeah, you're right. I tried it again now, and it takes that long. I don't know why it worked before. Back to the drawing board for me.

I couldn't load a category, because that consisted of more than 500000 pages, I think.

Agreed here. You'll need an SSD. Even then it was about 5 or seconds for me.

If you find a fix for that, that would be genial.

Yeah, I'm working on a fix this for 2016-10. Will keep you posted.

Went through some articles and no one was missing an image. Taking the dates the same was probably the best choice. ^^

Cool. Goot to hear.

I'm using a 4k monitor, therefore I have a small interface http://prnt.sc/cnd85h and I don't know how to fix this.

Hmm... Don't have a 4k monitor, so don't have any experience. I don't know what OS you're on, but Windows 8 seems to be the key. I did a quick search, and found these links:

Hope this helps

Ope30 commented 7 years ago

I'll try to get this fixed. Thanks anyways. ^^

You can close the thread.

gnosygnu commented 7 years ago

Yup. Will close it after the next release. Thanks!

gnosygnu commented 7 years ago

As per above, I'm closing the issue. To summarize: