gnosygnu / xowa

xowa offline wiki application
374 stars 41 forks source link

bad argument #1 to 'gmatch' #450

Closed desb42 closed 5 years ago

desb42 commented 5 years ago

With the recent changes to regex, I thought I would take a look at a Portal page I built the latest version using Running xowa-gui The page (2019-03-01) Gives the error

invoke failed: Portal:Arts {{#invoke:Selected recent additions|main}}
 [err 0] <gplx> @libraryUtil.lua:13 bad argument #1 to 'gmatch' (string expected, got nil)
    stack traceback:
    Module:Selected recent additions:60: in function <Module:Selected recent additions:59>
    Module:Selected recent additions:93: in function '__index'
    Module:Selected recent additions:130: in function <Module:Selected recent additions:129>
    mw.lua:531: in function <mw.lua:530>
    [Java]: in function '__index'
     in function <g:/xowa_dev/bin/any/xowa/xtns/Scribunto/engines/Luaj/MWServer.lua:74>
    [Java]: in ?: ttl=Arts excerpt={{#invoke:Selected recent additions|main}}
      gplx.xowa.xtns.scribunto.Scrib_core.Handle_error(Unknown Source)
      gplx.xowa.xtns.scribunto.engines.luaj.Luaj_engine.Dispatch_as_kv_ary(Unknown Source)
      gplx.xowa.xtns.scribunto.engines.luaj.Luaj_engine.CallFunction(Unknown Source)
      gplx.xowa.xtns.scribunto.Scrib_core.Invoke(Unknown Source)
      gplx.xowa.xtns.scribunto.Scrib_invoke_func.Func_evaluate(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_invk_tkn_.Eval_func(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_invk_tkn.Tmpl_evaluate(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_defn_tmpl.Tmpl_evaluate(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_invk_tkn.Tmpl_evaluate(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_defn_tmpl_.Make_itm(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_defn_tmpl_.CopyNew(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_invk_tkn.Tmpl_evaluate(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_tmpl_wtr.Write_tkn(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_tmpl_wtr.Write_tkn(Unknown Source)
      gplx.xowa.parsers.tmpls.Xot_tmpl_wtr.Write_all(Unknown Source)
      gplx.xowa.parsers.Xop_parser.Expand_tmpl(Unknown Source)
      gplx.xowa.parsers.Xop_parser.Expand_tmpl(Unknown Source)
      gplx.xowa.parsers.Xop_parser.Parse_text_to_wdom(Unknown Source)
      gplx.xowa.parsers.Xow_parser_mgr.Parse(Unknown Source)
      gplx.xowa.wikis.pages.Xowe_page_mgr.Load_page(Unknown Source)
      gplx.xowa.guis.views.Load_page_wkr.Thread__exec(Unknown Source)
      gplx.core.threads.Gfo_thread_pool.Run_wkr(Unknown Source)
      gplx.core.threads.Gfo_thread_pool.Invk(Unknown Source)
      gplx.Gfo_invk_.Invk_by_msg(Unknown Source) Source)

I also note that mediawiki have changed the portal pages on enwiki They do not seem to take any where near as much CPU time as when last I looked (#424 ) My own target is to go for a new download for 2019-06-01 dump (when we get there)

gnosygnu commented 5 years ago

Hey, sorry for the delay on my side. Getting close to a launch at work, and been working late

Unfortunately, I wasn't able to reproduce this. See my server log below.

My only guess is that you might have an older version of luaj_xowa.jar somehow? When you get a chance, try the following:

20190506_033419.120 page.async:
20190506_033419.314 download pass: src='|url&redirects&titles=File:Rembrandt_van_Rijn_-_Self-Portrait_-_Google_Art_Project.jpg' trg='mem/download.tmp'
20190506_033419.330 file.get: file=Rembrandt_van_Rijn_-_Self-Portrait_-_Google_Art_Project.jpg width=120 page=Portal:Arts
20190506_033419.425 download pass: src='' trg='C:\xowa_dev\file\\thumb\b\d\c\c\Rembrandt_van_Rijn_-_Self-Portrait_-_Google_Art_Project.jpg\120px.jpg'
20190506_033419.555 download pass: src='|url&redirects&titles=File:Las_Meninas%2C_by_Diego_Vel%C3%A1zquez%2C_from_Prado_in_Google_Earth.jpg' trg='mem/download.tmp'
20190506_033419.569 file.get: file=Las_Meninas,_by_Diego_Velázquez,_from_Prado_in_Google_Earth.jpg width=120 page=Portal:Arts
20190506_033419.594 download pass: src='' trg='C:\xowa_dev\file\\thumb\3\1\6\2\Las_Meninas,_by_Diego_Velázquez,_from_Prado_in_Google_Earth.jpg\120px.jpg'
20190506_033419.720 download pass: src='|url&redirects&titles=File:Louis-Marie_Autissier%2C_Self-portrait_edit.jpg' trg='mem/download.tmp'
20190506_033419.734 file.get: file=Louis-Marie_Autissier,_Self-portrait_edit.jpg width=200 page=Portal:Arts
20190506_033419.760 download pass: src='' trg='C:\xowa_dev\file\\thumb\7\a\6\0\Louis-Marie_Autissier,_Self-portrait_edit.jpg\200px.jpg'
20190506_033419.901 download pass: src='|url&redirects&titles=File:Slonimski_Chaim_Zelig.jpg' trg='mem/download.tmp'
20190506_033419.916 file.get: file=Slonimski_Chaim_Zelig.jpg width=100 page=Portal:Arts
20190506_033420.010 download pass: src='' trg='C:\xowa_dev\file\\thumb\8\d\c\b\Slonimski_Chaim_Zelig.jpg\80px.jpg'
20190506_033420.169 download pass: src='|url&redirects&titles=File:Kane_Selfportrait.jpg' trg='mem/download.tmp'
20190506_033420.182 file.get: file=Kane_Selfportrait.jpg width=120 page=Portal:Arts
20190506_033420.205 download pass: src='' trg='C:\xowa_dev\file\\thumb\6\7\3\2\Kane_Selfportrait.jpg\120px.jpg'
20190506_033420.261 file.get: file=U.S._Army_Band_-_A_la_Nanita_Nana_edit.ogg width=220 page=Portal:Arts
20190506_033420.340 download pass: src='|url&redirects&titles=File:Sergei_Prokofiev_circa_1918_over_Chair_Bain.jpg' trg='mem/download.tmp'
20190506_033420.355 file.get: file=Sergei_Prokofiev_circa_1918_over_Chair_Bain.jpg width=120 page=Portal:Arts
20190506_033420.378 download pass: src='' trg='C:\xowa_dev\file\\thumb\0\3\3\3\Sergei_Prokofiev_circa_1918_over_Chair_Bain.jpg\120px.jpg'
20190506_033420.438 redlink.redlink_bgn: page=Portal:Arts total_links=200
20190506_033420.485 redlink.redlink_end: redlinks_run=0
20190506_033423.438 page.load:
20190506_033423.438 page_load: loaded wikitext; page=Special:XowaDefaultTab wikitext_len=0
20190506_033423.458 page.async:
20190506_033423.458 redlink.redlink_bgn: page=Special:XowaDefaultTab total_links=0
20190506_033423.458 redlink.redlink_end: redlinks_run=0
desb42 commented 5 years ago

I have just run another xowa_get_and_make I note that the timestamp on the bin directory is 28/04/2019 23:36 (dd/mm/yyyy) The luaj_xowa.jar file does contain Match_state.class I copy the jar file, rename it as a .zip and use file explorer to look inside the zip class I once again get the error arts1 The session log is

Note that I get a lot of File:Blank.png entries (I did not see this in your log)

gnosygnu commented 5 years ago

Thanks for the screenshot. I see my mistake. I actually updated my Windows version to be 2019-05. I think my Linux / build version is 2019-03. Let me pull them over tomorrow and see what the problem is.


I also note that mediawiki have changed the portal pages on enwiki They do not seem to take any where near as much CPU time as when last I looked (#424 )

Yeah, these are much quicker in 2019-05 (as I inadvertently discovered above)

gnosygnu commented 5 years ago

It turns out the problem is caused by a missing article from the dump. This is similar to #367.

Specifically, the following wikitext was causing the error:

{{Transclude selected recent additions | %sactor%s | %sart%s | %sarts%s | %scomic | %smuseum%s | %spainting%s | %ssculpture%s | months=12 | header={{Box-header colour|Did you know...  }}|max=12}}

This was caused by and the following lines

    local title ='Wikipedia:Recent additions' .. subpage)
    local raw = title:getContent()
    local itemPattern = '%*%s?%.%.%.[%S ]*'
    local items = {}
dbg(subpage, raw, itemPattern);
    for item in mw.ustring.gmatch(raw, itemPattern) do

The actual subpage was /2019/April which generated a page of Wikipedia:Recent additions/2019/April which didn't exist in the 2019-03 dump

Anyway, thanks for the write-up and sticking through with it above. Will mark closed in a few days unless there are other questions

desb42 commented 5 years ago

It sure looks like another bot

Do you know of anyway to find out what bots are used (regularly) on wikipedia?

gnosygnu commented 5 years ago

Do you know of anyway to find out what bots are used (regularly) on wikipedia?

Not really. I've never looked into it before. I did a quick search now, and found these pages:

As for the page in question, I'm not sure if it's bot-created. They look like they've been redirected by the same user at different times of the day

gnosygnu commented 5 years ago

Marking this as closed as the error is related to a Portal page which doesn't exist at the time of the dump. Relevant excerpt below.

The actual subpage was /2019/April which generated a page of Wikipedia:Recent additions/2019/April which didn't exist in the 2019-03 dump