gnosygnu / xowa

xowa offline wiki application
Other
376 stars 40 forks source link

Subcategories in Category pages #270

Open desb42 opened 6 years ago

desb42 commented 6 years ago

en.wikipedia displays subcategories as: subcats_wiki As a partial step (in concept) we could subcats_concept

The counts of subcategories,pages and files is held in the cat_core table The concept I was working on was: In Xoctg_catlink_loader.java to add an extra join to the Bld_sql_by_db function when the grp_tid was set to subcategories (0). Along the lines of

SELECT  cl_to_id 
,       cl_from 
,       cl_type_id 
,       cl_sortkey AS cl_sortkey 
,       cl_sortkey_prefix AS cl_sortkey_prefix 
,       cca.cat_pages 
,       cca.cat_subcats 
,       cca.cat_files 
FROM    cat_link cl 
JOIN cat_core cca ON cl.cl_from = cca.cat_id 
WHERE   cl_to_id = 780754 
AND     cl_type_id = 0 
AND     cl_sortkey >= '' 
UNION 
SELECT  cl_to_id 
,       cl_from 
,       cl_type_id 
,       cl_sortkey AS cl_sortkey 
,       cl_sortkey_prefix AS cl_sortkey_prefix 
,       cca.cat_pages 
,       cca.cat_subcats 
,       cca.cat_files 
FROM    link2.cat_link cl 
JOIN cat_core cca ON cl.cl_from = cca.cat_id 
WHERE   cl_to_id = 780754 
AND     cl_type_id = 0 
AND     cl_sortkey >= ''

This then would give (with other changes to allow for the fact that there are three extra columns) the values necessary to produce the info above However because of #268 I cant do it

Another possible way is to do the query as it is already and then with the results do a separate sql query to obtain the numbers (only for subcategories)

What do you think

gnosygnu commented 6 years ago

I think the best approach is a separate SQL query to obtain the numbers. Otherwise, the pages and files section would be penalized with an extra JOIN when it doesn't need the numbers. Also, SQLite works best with single-table reads, and I get a bit nervous when doing JOINs across so many records (for example, de.w:Category:Mann has 620,000 pages)

That said, you can try your route and see how the SQL performs.

Regarding #268: I'd try to rebuild en.wikipedia.org with the increased cat_link_db max. If you want, you can wait for a rebuild from me, but that probably won't be available for another 2 or 3 weeks.

Alternatively, you can manually combine the cat_link databases so that instead of 11 databases, you only have 9 or so. You would need a bunch of "INSERT INTO" statements, as well as some deletes in xowa_db table in the core db. If you're interested in this route, I can post the SQL tomorrow

Hope this helps.

desb42 commented 6 years ago

I think the separate sql query is probably the better alternative. (over to you for that one!) As for reducing the number of database files. The simple approach would be just to delete one entry from the xowa_db table.

desb42 commented 6 years ago

Just pushed a version to my repo cat_core that demonstrates what I am trying to do For the purpose of experimentation I applied the following sql statement to make things work for me update xowa_db set db_type=999 where db_id=39 to bring the number of attached tables down to the max of 10

gnosygnu commented 5 years ago

Cool. Thanks for the commit.

I think the back-end / Java part looks fine. The HTML side should be straightforward

On a larger scale, I had hoped to implement the extension for https://www.mediawiki.org/wiki/Extension:CategoryTree . This would handle some of the AJAX behavior such as opening the twisty, and drilling-down the Categories. For now, extra info on subcategory membership is pretty neat, and comes easily with a small change. I'll take a look at adding this, as soon as I add the rest of your changes. Thanks!

desb42 commented 5 years ago

One other thing to note is that Mediawiki now seems to use 2 columns not 3

It might be that its two or three depending on criteria I dont know about or just another Mediawiki enharncement