desb42 / myxowa

my version of xowa
Other
3 stars 1 forks source link

Hello to desb42 #7

Open catmaps opened 1 year ago

catmaps commented 1 year ago

Hello desb42. I want to reach out and say "Hi." A while ago you had helped out me at https://github.com/gnosygnu/xowa/issues/829

I gave Gnosygnu feedback from 2016 to 2020. But he is now very quiet--I don't know if he still plans to resume someday.

But today I noticed that you are continuing to improve your fork of Xowa. 👍

I would be glad to hear how you wish to improve in MyXowa.

If interested, I could offer feedback and testing and suggestions.

desb42 commented 1 year ago

hello @catmaps

yes, I am continuing to 'improve' my fork of xowa

This fork has diverged somewhat from xowa, to the point where the datafiles uploaded by @gnosygnu would no longer be usable.

My own internet connection is not fast (down 16mbps, up 2mbps).

As an example en.wikipedia.org is now roughly 206 files, 440Gb in size (including 327Gb images) whereas species.wikimedia.org is roughly 33 files 5.8Gb (including 3.35Gb of images)

Most of the work I have been doing is to try to make the pages for a wiki identical to the original (although edits to the page after a dump need to be taken into consideration) using the http interface not the gui side

One of the most recent changes is the introduction of a new skin (Vector-2022) now introduced to enwiki but not specieswiki (yet)

Another area is the images (the data files just increase, there is no deletion of old content, or updating of images with newer images)

One way to go ahead is to find a mutually agreeable site to upload one (or more) of the smaller wikis to.

Another (if you are a programmer) is to build from scratch

What are your thoughts?

catmaps commented 1 year ago

Thank you for your reply - Very interesting! I have some thoughts but I want to first think this over the weekend and then reply.

catmaps commented 1 year ago

I am back. I have several different thoughts about images hosting and the ever-changing wikimedia templates etc. But first to better understand where we are at, I have a few questions:

  1. For the Wikipedia images, you use the Sept 2020 images but then periodically build newer files to go with the original 83 main image files?
  2. For the text side, you also make html files?
  3. The new skin is also very interesting. I use xowa gui and noticed that more issues after trying the Nov 2022 data dump. I try to compensate the best I can with settings changes etc but it is becoming problematic.
  4. In addition to skin, templates, & images problem; are there other problems we should be thinking about? I use the zulu java with success so far. But I noticed since summer of 2020 Xowa, no longer follows to the correct link after clicking an image in a wiki and the click on its Full Resolution link. See the red circles in the pictures (go to Snowflake article & click on that image.) The first picture adds en.wikipedia.org/wiki/ to the actual image path. The second picture uses Wikidata's main page. It is the same copy of Xowa on a different computer with a few Options set differently. Interestingly, in both cases I can right-click & select Copy and it gives me the correct image path. That makes me think it is not a skin problem, but some other underlying issue on the browser side. It has me stumped. If you have ideas what is going wrong or any files to adjust deep inside Xowa, please let pass on to me. [I especially use the full res feature for small wikis such as Scribus--In the wiki subfolder I add the entire tree folder of full resolution images and then Xowa opens those or passes to external viewer application (pdf files).]

Snowflake Snowflake2

desb42 commented 1 year ago

hello @catmaps

to answer your points 1) yes - In the sense that new images are added when identified (by the html build of the wiki); however, this is an area I am trying to renovate. At the moment there is no attempt to update any images, and no attempt to remove unnecessary/unused images. I am experimenting with adding columns to allow timestamping the images.

I can see the benefit of adding images (from an up/down load perspective) but there also comes a time when the whole lot need to be refreshed (and reduced).

2) yes - as mentioned above, to identify the images, I do a batch build of html (mainly the Main namespace, but some of the other namespaces are built as well) - I have also changed how the html is stored, to save a small amount of space (and I believe time)

3) I have not been following the xowa dumps recently - I did not know that a Nov 2022 dump exists

4) I am sure there are other things, database rearrangement, additional tables and columns to name some

I am now using Java 17.2 - not for any of the specific new features, just to be reasonably current.

When it comes to looking at the full image. Xowa is intended essentially as an offline copy of a wiki including images (where these images are thumbnails intended for use in the html pages). I believe that looking at a link like en.wikipedia.org/wiki/File:Snowflake_macro_photography_1.jpg should go directly to the real website ().

For the image discussed Snowflake_macro_photography_1.jpg, although the link uses en.wikipedia.org within the page (on the real site) is the line 'This is a file from the Wikimedia Commons. Information from its description page there is shown below.'

I see from the snapshot, that in the list of wikis commons.wikimedia.org is not listed - I wonder if downloading that wiki would make a difference?

catmaps commented 1 year ago

Just a brief reply until I can look into it deeper: Your reply to numbers 1 and 2 is helpful. . . . I am still trying to test some of my ideas for having current images. For number 3, I was referring to the regular wiki dumps at https://dumps.wikimedia.org/enwiki/ Thanks for the idea on 4, I will do some testing including a commons wiki next week.

catmaps commented 1 year ago

I am back. I am still building/testing the Commons wiki addition. But here are a few general ideas/considerations on Xowa.

  1. I use Xowa primarily to import xml dumps (as wikitext) in a customized form. Years ago Gnosynu kindly added the dansguardian import feature. So I import custom educational wikis for the benefit of hundreds of people annually. Since these go primarily on regular computers (no raspberries or android devices), html format has not been my priority.
  2. I suspect that people who simply use Wikipedia offline will opt for the ready-made Kiwix. But from observing the Github issues threads, I suspect many Xowa users appreciate Xowa for special wiki analysis and custom wiki building.
  3. It seems that the two enormous tasks for Gnosygnu were (a) updating Xowa with ever-changing mediawiki templates/formats/skins and (b) building ready-made wikis for Xowa. Very hard to keep after. (I wished I would have given him more PayPal donations than what I had given.)
  4. So I suggest: Is it possible to only focus on keeping Xowa up-to-date but then add more building capabilities so that Xowa users can more easily build their own wikis? If that is possible, then the hosting of Xowa databases would seem unnecessary.
  5. I suspect the pictures would be the most difficult aspect. Therefore, I suggest if feasible to take ready-made zim's with pictures from Kiwix library. For testing, see small zim from https://library.kiwix.org/viewer#wikipedia_en_simple_all_maxi_2023-06 and run it thru a Zim extractor and then see how to import the images folder into Xowa image tables.
  6. I have experimented with zim extracting a few small StackExchange zim's using the tool at https://github.com/dignifiedquire/zim Another tool is zimdump. See https://www.openzim.org/wiki/Zimdump
  7. I was impressed how well the Kiwix folks cleanly and consistently organized the data.
  8. For inspiration, consider how Bartvelp made a converter to convert zim's to his sqlite tables. https://github.com/Bartvelp/zim-converter
  9. My wish: Can we add versatile zim import tools to Xowa? For instance, one setting would allow an Xowa user to import the pictures directly into Xowa's tables. Another import setting would allow the Xowa user to dump all pictures into an images folder. (This would allow adapting a photo set for a custom wiki.) A third setting would allow importing a folder tree of images into Xowa's sqlite tables.
  10. Another set of import settings would allow importing html text from a zim with subsettings for first into folder and then into tables. It seems these kinds of options with versatility would be appreciated by other users.
  11. The above plan would depend on Kiwis to do the hard work with image updates and image hosting. Less image download time is needed for Xowa users because zim images are compressed more.
  12. I look forward to hearing your thoughts on the above. (I might not have time to respond until after July 6. But I anticipate more time for Xowa during the month of July.)