gnosygnu / xowa

xowa offline wiki application
Other
378 stars 40 forks source link

(v4.0.0.1701) Random Page not fully working, commons.wikipedia categories are not showing (not listed), missing images (more to come maybe) #121

Closed Ope30 closed 7 years ago

Ope30 commented 7 years ago

Hey! I just found out that you released a new build and it's actually quite good! The first thing I tried out were the shortcuts you know, I can confirm that they work now. Hollywood Walk of Fame no longer has that weird Error. You also changed the Options System and it looks pretty good. ^^

Now, regarding the bugs, I found two things, one counts as an bug and one counts as an missing image. Let me bring it up as good as I can:

I will 100% redownload EN and DE, I have to anyways because missing images, you know. If I find further things I will inform you right here. But I don't think there's anything left.

gnosygnu commented 7 years ago

Hi! Happy new year, and thanks for the great descriptions!

Regarding the random page, this is related to the new Options system. For now, you can fix this on your side, by doing the following:

The issue is that it the new Options system mistakenly change the default from "checked" to "unchecked". I'll put in a fix for this later this week to change the default back to "checked". I'll also put in some more code to make sure that Random still works even if this option is changed back to "unchecked".

Regarding the missing images, this is unfortunately a wrong dump issue. Specifically:

This was a mistake in my script, and was not something I double-checked (I double-checked Commons, but forgot that files can be uploaded only to English Wikipedia). This will be fixed in the 2017-01 build. Of course, I'll add this to my double-check process for future builds as well.

Hope this helps. Let me know if there's anything else.

Thanks!

Ope30 commented 7 years ago

Hi! Happy new year, and thanks for the great descriptions!

You too mate!

Regarding the random page, this is related to the new Options system. For now, you can fix this on your side

Yeah, that did it for me. Thanks for your advice!

I'll also put in some more code to make sure that Random still works even if this option is changed back to "unchecked

I agree. I'm not 100% sure, but I feel like when you prefer HTML Databases loading articles/images takes more time.

Regarding the missing images, this is unfortunately a wrong dump issue.

Ah. However, I'll just redownload. No problem for me.

This was a mistake in my script, and was not something I double-checked (I double-checked Commons, but forgot that files can be uploaded only to English Wikipedia). This will be fixed in the 2017-01 build. Of course, I'll add this to my double-check process for future builds as well.

Alright! That sounds good.

Thanks for your answer. It definitely answered some of my questions. I will just download 2017-01!

gnosygnu commented 7 years ago

Yeah, that did it for me. Thanks for your advice!

Cool. Thanks for the confirmation!

I agree. I'm not 100% sure, but I feel like when you prefer HTML Databases loading articles/images takes more time.

That's odd. Images live in a different section of the code and shouldn't take longer. On the other hand, articles do live in the same section of code, but they really should not take any longer. I looked at it briefly now, and don't see any reason for a slowdown. I'll check again when I put in the code for the other fix

Ah. However, I'll just redownload. No problem for me.

Cool. Thanks for understanding.

Alright! That sounds good. Thanks for your answer. It definitely answered some of my questions. I will just download 2017-01!

Yup. Will comment in the other thread when they go live. Thanks!

Ope30 commented 7 years ago

That's odd. Images live in a different section of the code and shouldn't take longer. On the other hand, articles do live in the same section of code, but they really should not take any longer. I looked at it briefly now, and don't see any reason for a slowdown. I'll check again when I put in the code for the other fix

I tested it again and I believe it is only slower when you use the Random Page function. When you jump from article to article, even if you prefer them, everything is alright, but if you get to an random article it takes a little bit more time. I could be wrong.

For your information, here's an missing image I found: https://de.wikipedia.org/wiki/Georg_Wilhelm_Friedrich_Hegel, "Georg Wilhelm Friedrich Hegel".

gnosygnu commented 7 years ago

I tested it again and I believe it is only slower when you use the Random Page function. When you jump from article to article, even if you prefer them, everything is alright, but if you get to an random article it takes a little bit more time. I could be wrong.

I looked at the code today, and the option shouldn't make a difference -- especially for random page (as opposed to normal page lookup)

For your information, here's an missing image I found: https://de.wikipedia.org/wiki/Georg_Wilhelm_Friedrich_Hegel, "Georg Wilhelm Friedrich Hegel".

Yeah, that image was uploaded on 2016-11-02: https://de.wikipedia.org/wiki/Datei:Georg_Wilhelm_Friedrich_Hegel_by_Julius_Ludwig_Sebbers.jpg . Keep in mind that the German Wikipedia was generated on 2016-11-01 so it was one of those images that miss the cut (because Commons is dumped a little later than German Wikipedia)

Hope this helps. Let me know if there's anything else. Thanks!

Ope30 commented 7 years ago

I looked at the code today, and the option shouldn't make a difference -- especially for random page (as opposed to normal page lookup)

Thanks for looking at the code. I think I'm going to agree with you. ^^

Yeah, that image was uploaded on 2016-11-02: https://de.wikipedia.org/wiki/Datei:Georg_Wilhelm_Friedrich_Hegel_by_Julius_Ludwig_Sebbers.jpg . Keep in mind that the German Wikipedia was generated on 2016-11-01 so it was one of those images that miss the cut (because Commons is dumped a little later than German Wikipedia)

Ah. I hate that lol. I think I asked you this already, but is it going to be the same date order for the next German wikis or any others? I mean, if so, wouldn't that mean that there will always be missing images? :(

gnosygnu commented 7 years ago

Ah. I hate that lol. I think I asked you this already, but is it going to be the same date order for the next German wikis or any others? I mean, if so, wouldn't that mean that there will always be missing images? :(

Yeah, there will always be missing images. I detail the two major use cases below.

But there really shouldn't be many. I'm talking something like less than .1%. Since there are a few million images, I know that still comes out to several thousand, but you should see a missing image once out of every few hundred articles or so.

If you're coming across them more often, let me know. I know I promised to put in more code for missing images but I tabled that to the side. I can look at it again next week.

Anyway, here are the major cases:

Ope30 commented 7 years ago

Yeah, there will always be missing images. I detail the two major use cases below.

But there really shouldn't be many. I'm talking something like less than .1%. Since there are a few million images, I know that still comes out to several thousand, but you should see a missing image once out of every few hundred articles or so.

Dump date differs: A wiki and commons don't dump at the same time. So, it can happen that a Wiki dump has an image, but that image hasn't been added to Commons. I can add code to try to get the metadata for these images separately, but it's more code, and it always seemed like a low-return (see ".1%" above)

Deleted / renamed images: A wiki can include an image, but (a) it gets deleted / renamed from commons a few days later and (b) the article points to a different image. If this happens, there's really nothing I can do about it (I'd have to get a new copy of the article, which has its own complications)

Damnit. Thanks for making that clear. I know that not many are missing, but it scratches me. I wish you'd find a code or something for all of this for me (or anyone else) to go around it though. Incase you find something regarding missing images, if you find any solutions or anything like that, please inform me. I would like to be up-to-date if that is okay for you.

gnosygnu commented 7 years ago

I wish you'd find a code or something for all of this for me (or anyone else) to go around it though.

Yeah, I'll work on a solution for the 1st use-case (dump date differs). The 2nd one is harder, and I don't think there's any way around it without having to redownload the entire article (which is not as easy as it sounds)

Will let you know when I get something done for the 1st one though. Thanks!

gnosygnu commented 7 years ago

I'm going to close this issue out. As mentioned above, there are two outstanding issues and neither can be easily changed:

I'll revisit this issue if there turn out to be substantially more missing issues that can be resolved by either approach.

Thanks.