Closed gettimothy closed 5 years ago
Hey, thanks for all the kind words.
You can try running the wiki.categorylinks step by itself.
commons.wikimedia.org-xtn.category.core.xowa
commons.wikimedia.org-xtn.category.link-db.001.xowa
...through...
commons.wikimedia.org-xtn.category.link-db.035.xowa
add ('commons.wikimedia.org' , 'wiki.categorylinks');
This should generate a fresh set of xtn.category.core / xtn.category.link databases
If it still fails, send me the output from /xowa/user/anonymous/app/tmp/session
as well as the console output.
Finally, you can always try https://www.sqlite.org/pragma.html#pragma_integrity_check
Hope this helps, and best of luck.
thx, will do.
ok...that error went away, but got a new one.
got a bunch of missing db messages as seen in first line of output below. Then it spent hours trying to index well over 200,000,000 elements. (I think it was closer to 3 hundred million) then it threw the below error:
wiki.db:missing db; tid=xtn.category.link url=/mnt/tmp/xowa_maven/wiki/commons.wikimedia
...
...many of these when I kicked of the script after removing the files per your
...
(the categorylink indexing was occuring here for several hours and then:)
error while executing script: err=[err 0] <gplx> error while generating catlink dbs: err=[err 0] <java.lang.ClassCastException> class gplx.dbs.engines.noops.Noop_conn_info cannot be cast to class gplx.dbs.engines.sqlite.Sqlite_conn_info (gplx.dbs.engines.noops.Noop_conn_info and gplx.dbs.engines.sqlite.Sqlite_conn_info are in unnamed module of loader 'app') [trace]: gplx.dbs.engines.sqlite.Sqlite_conn_info.To_url(Sqlite_conn_info.java:67) gplx.dbs.Db_attach_itm.<init>(Db_attach_itm.java:28) gplx.xowa.addons.wikis.ctgs.bldrs.Xob_catlink_wkr.Make_catlink_dbs(Xob_catlink_wkr.java:31) gplx.xowa.addons.wikis.ctgs.bldrs.Xob_catlink_mgr.On_cmd_end(Xob_catlink_mgr.java:102) gplx.xowa.addons.wikis.ctgs.bldrs.Xob_catlink_cmd.Cmd_end(Xob_catlink_cmd.java:59) gplx.xowa.bldrs.Xob_bldr.Run(Xob_bldr.java:192) gplx.xowa.bldrs.Xob_bldr.Invk(Xob_bldr.java:237) gplx.langs.gfs.GfsCore_.Exec(GfsCore_.java:31) gplx.langs.gfs.GfsCore_.Exec(GfsCore_.java:64) gplx.langs.gfs.GfsCore_.Exec(GfsCore_.java:64) gplx.langs.gfs.GfsCore.ExecOne_to(GfsCore.java:82) gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_str_for(Xoa_gfs_mgr.java:86) gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_str_for(Xoa_gfs_mgr.java:77) gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_url_for(Xoa_gfs_mgr.java:69) gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_url(Xoa_gfs_mgr.java:61) gplx.xowa.apps.boots.Xoa_boot_mgr.Run_app(Xoa_boot_mgr.java:133) gplx.xowa.apps.boots.Xoa_boot_mgr.Run(Xoa_boot_mgr.java:38) gplx.xowa.Xoa_app_.Run(Xoa_app_.java:28) gplx.xowa.Xowa_main.main(Xowa_main.java:22)
[err 1] <bldr> unknown error: key=wiki.categorylinks
[err 2] <bldr> unknown error
[trace]:
gplx.xowa.addons.wikis.ctgs.bldrs.Xob_catlink_mgr.On_cmd_end(Xob_catlink_mgr.java:106)
gplx.xowa.addons.wikis.ctgs.bldrs.Xob_catlink_cmd.Cmd_end(Xob_catlink_cmd.java:59)
gplx.xowa.bldrs.Xob_bldr.Run(Xob_bldr.java:192)
gplx.xowa.bldrs.Xob_bldr.Invk(Xob_bldr.java:237)
gplx.langs.gfs.GfsCore_.Exec(GfsCore_.java:31)
gplx.langs.gfs.GfsCore_.Exec(GfsCore_.java:64)
gplx.langs.gfs.GfsCore_.Exec(GfsCore_.java:64)
gplx.langs.gfs.GfsCore.ExecOne_to(GfsCore.java:82)
gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_str_for(Xoa_gfs_mgr.java:86)
gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_str_for(Xoa_gfs_mgr.java:77)
gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_url_for(Xoa_gfs_mgr.java:69)
gplx.xowa.apps.gfs.Xoa_gfs_mgr.Run_url(Xoa_gfs_mgr.java:61)
gplx.xowa.apps.boots.Xoa_boot_mgr.Run_app(Xoa_boot_mgr.java:133)
gplx.xowa.apps.boots.Xoa_boot_mgr.Run(Xoa_boot_mgr.java:38)
gplx.xowa.Xoa_app_.Run(Xoa_app_.java:28)
gplx.xowa.Xowa_main.main(Xowa_main.java:22)
This is just a FYI, as I think I can just hack the wikidatawiki-latest-categorylinks.sql directly into postgres.
The /xowa/user/anonymous/app/tmp/session is an empty directory. I poked around in app/tmp/log and xolog, but nothing seemed pertinent.
Thank you for your time.
Oops. I see the problem.
The files you moved / deleted are still in commons.wikimedia.org-file-core.xowa
in the xowa_db
table
This causes this error (which is harmless)
wiki.db:missing db; tid=xtn.category.link url=/mnt/tmp/xowa_maven/wiki/commons.wikimedia
...
...many of these when I kicked of the script after removing the files per your
...
But it causes this error (which fails the operation)
error while executing script: err=[err 0] <gplx> error while generating catlink dbs: err=[err 0] <java.lang.ClassCastException> class gplx.dbs.engines.noops.Noop_conn_info cannot be cast to class gplx.dbs.engines.sqlite.Sqlite_conn_info (gplx.dbs.engines.noops.Noop_conn_info and gplx.dbs.engines.sqlite.Sqlite_conn_info are in unnamed module of loader 'app') [trace]: gplx.dbs.engines.sqlite.Sqlite_conn_info.To_url(Sqlite_conn_info.java:67) gplx.dbs.Db_attach_itm.<init>(Db_attach_itm.java:28) gplx.xowa.addons.wikis.ctgs.bldrs.Xob_catlink_wkr.Make_catlink_dbs(Xob_catlink_wkr.java:31)
Actually, you don't need to run add ('commons.wikimedia.org' , 'wiki.categorylinks');
My script omits it: http://xowa.org/home/wiki/Dev/Command-line/Dumps#Script:_gnosygnu.27s_actual_English_Wikipedia_script_.28dirty.3B_provided_for_reference_only.29
If you do want to run it, then you can try removing the first batch of files from the xowa_db table with DELETE FROM xowa_db WHERE db_type IN (6, 7)
.
If that fails, you can also restart the whole import from scratch. But IMHO, categories is probably not worth it
Hope this helps. Thanks!
Thank you for the reply.
I was able to import the categories directly into postgres by hacking the mysql dump file. (I did this on wikidata, I will probably do on commons too.
To summarize, the second one failed because of a "bookeeping" error in the databases you populate for keeping track of which database has what. By resetting the "bookeeping" it would have finished.
That is good to know going forward.
thx for your time.
First, thanks for your software.
I have taken your "dirty" gfs script from http://xowa.org/home/wiki/Dev/Command-line/Dumps and broken it into sections so I can get a sense of things and isolate problems as they occur. I have successfully imported wikikdata (!) and am now importing commons.
The section of gfs script I am using is:
Download, unzip and database creation/extraction have run and a truncated list of files on my system is:
The error is:
This has me stumped. I believe the failing task is
add ('commons.wikimedia.org' , 'wiki.categorylinks');
because when I woke up to the error, the output preceding it looked like something I would expectadd ('commons.wikimedia.org' , 'wiki.categorylinks');
to produceI poked around a bit, and it looks like
link_db.cat_link
may be an alias forxowa.temp.category.sqlite3
, but I don't know this.I have successfully opened
xowa.temp.category.sqlite3
with sqlitebrowser and many millions of records are in it.If you could point me where to look, I will be happy to debug this for you.
A possible wildcard in this is that I bought an 8 TB external disk for this work and it seems slower than an internal disk. Perhaps there is a latency issue.
Thank you for your time.
t