dheera / android-attopedia

A smartwatch-friendly interface for Wikipedia
http://dheera.net/projects/android-attopedia
32 stars 4 forks source link

Implemented support for more Wikipedia Languages. #1

Open jhbruhn opened 10 years ago

jhbruhn commented 10 years ago

It’s using Locale.getDefault().getLanguage() + ".wikipedia.org" to determine the Wikipedia URL. I only tested it with de.wikipedia.org, but it worked perfectly fine.

jhbruhn commented 10 years ago

Looks like I accidentally hit Android Studios Format-Code Button, I'm sorry about that.

dheera commented 10 years ago

Hi Jan, thanks!!

There are a couple of potential issues we may need to work out:

  1. Some locales may not have a Wikipedia, and this would cause no results to be returned. In this case we'd want to fall back to English, probably.
  2. Some locales' Wikipedias are very incomplete, or have very few articles, and/or the articles are lacking compared to their English counterparts, so the people in these places often prefer reading the English (or other language) Wikipedia. This is probably not an issue with German, but it is for a lot of less common languages.
  3. We need to implement/test the logic to get rid of disambiguation pages (since I don't have link support yet, disambiguation pages are useless at present; however Google search results tend to bring up the most relevant disambiguation that the user spoke, and the user can even voice search the full text of articles, so disambiguation pages aren't really that needed anyway), templates, user pages, etc. in other languages. For example in the lastest commit, I have -"disambiguation page" in the Google search string. However in other languages this string itself may need to be translated and we'd have to test this.

Any thoughts? Thanks!!


Dheera Venkatraman http://dheera.net/

2014-08-04 21:49 GMT+00:00 Jan-Henrik notifications@github.com:

Looks like I accidentally hit Android Studios Format-Code Button, I'm sorry about that.

— Reply to this email directly or view it on GitHub https://github.com/dheera/android-attopedia/pull/1#issuecomment-51122751 .

dheera commented 10 years ago

One potential idea for #2 is that we can support the most complete languages for now only (perhaps English, German, Spanish, Chinese, etc.) and fall back to English if the user isn't among one of those.


Dheera Venkatraman http://dheera.net/

2014-08-05 4:28 GMT+00:00 Dheera Venkatraman dheera@dheera.net:

Hi Jan, thanks!!

There are a couple of potential issues we may need to work out:

  1. Some locales may not have a Wikipedia, and this would cause no results to be returned. In this case we'd want to fall back to English, probably.
  2. Some locales' Wikipedias are very incomplete, or have very few articles, and/or the articles are lacking compared to their English counterparts, so the people in these places often prefer reading the English (or other language) Wikipedia. This is probably not an issue with German, but it is for a lot of less common languages.
  3. We need to implement/test the logic to get rid of disambiguation pages (since I don't have link support yet, disambiguation pages are useless at present; however Google search results tend to bring up the most relevant disambiguation that the user spoke, and the user can even voice search the full text of articles, so disambiguation pages aren't really that needed anyway), templates, user pages, etc. in other languages. For example in the lastest commit, I have -"disambiguation page" in the Google search string. However in other languages this string itself may need to be translated and we'd have to test this.

Any thoughts? Thanks!!


Dheera Venkatraman http://dheera.net/

2014-08-04 21:49 GMT+00:00 Jan-Henrik notifications@github.com:

Looks like I accidentally hit Android Studios Format-Code Button, I'm

sorry about that.

— Reply to this email directly or view it on GitHub https://github.com/dheera/android-attopedia/pull/1#issuecomment-51122751 .

jhbruhn commented 10 years ago

For 1: We could search for results and if none are returned, search via the fallback language (see #2).

About your idea for 2: We could provide some working languages, but also give the user the option to force another language (via the preferences on the mobile-app).

3: This would mean that we have to translate all those page versions to the language (or is there any API by wikipedia to do that for us?).

dheera commented 10 years ago

Hmm, OK, let me play with it a bit in the next couple of days. :)

We might want to avoid it making 2 Google queries, since this would make the interface a bit slow to respond, but I think there may be a way with the Google search syntax to accomplish the fallback in 1 query (in fact I think Google rearranges the result order based on your language preference -- I'm pretty sure it doesn't give the same results to everyone).

By the way, I'm also thinking about potentially creating a proxy server that does all the HTML to JSON parsing on the server side. This is by far the biggest bottleneck in the speed of the current implementation, since many articles are several hundred KB of HTML which takes a while to download and the parsing on the phone takes several seconds in some cases. If this is moved to the server side it would become extremely efficient. I need to look into costs though and whether there's any entity that's willing to sponsor that, though.


Dheera Venkatraman http://dheera.net/

2014-08-05 8:24 GMT+00:00 Jan-Henrik notifications@github.com:

For 1: We could search for results and if none are returned, search via the fallback language (see #2).

About your idea for 2: We could provide some working languages, but also give the user the option to force another language (via the preferences on the mobile-app).

3: This would mean that we have to translate all those page versions to the language (or is there any API by wikipedia to do that for us?).

— Reply to this email directly or view it on GitHub https://github.com/dheera/android-attopedia/pull/1#issuecomment-51164451 .

michyprima commented 10 years ago

why not just putting an array of strings with supported locales? if the current locale isn't in it, fall back to english.