anki-geo / ultimate-geography

Geography flashcard deck for Anki
https://ankiweb.net/shared/info/2109889812
Other
817 stars 81 forks source link

Capital inconsistencies between languages #416

Closed axelboc closed 3 years ago

axelboc commented 3 years ago

Looking at #409, I noticed that the proclaimed capital of Palestine is actually inconsistent on Wikipedia across languages:

It's an even split! 😂 I decided not to fix it right away, as I thought it required further discussion... so here I am.

It actually reminded me of other capital inconsistencies we've discussed in the past, like Kiribati #166... So for fun, I decided to do a quick review of all the capital inconsistencies currently in the deck:


Looking at all of this, there are a number of incorrect or outdated capitals that clearly need to be fixed (Palestine, Sri Lanka, Tuvalu, Kazakhstan and Kiribati).

Assuming we do (I'll open a PR), we'll be be left with the following inconsistencies:

  1. one where half of the languages have a different capital than the others: Palestine;
  2. one where the capital is completely different in one language only - i.e. "Melekeok" for Palau in FR instead of Ngerulmud;
  3. three where multiple capitals are listed in one language only: "Sri Jayewardanapura Kotte, Colombo" for Sri Lanka in NB, "Brussel, Strasbourg, Luxembourg" for European Union in NB, "Podgorica, Cetiña" for Montenegro in ES;
  4. one where the capital is written in a specific way in one language only - i.e. "Tarawa (South Tarawa)" for Kiribati in CS instead of South Tarawa.

So my question is: could we amend our guidelines to remove the 3rd and 4th kinds of inconsistencies?

For 3., the other capital candidates are mentioned in the Capital info field anyway, and for 4., South Tarawa is just the more "precise" capital. Any thoughts?


EDIT I've opened two PRs to update some the capitals as per Wikipedia: #417 #418, and one issue to deal with the case of Palestine, since it's more complex: #419

EDIT Closed #418 as Wikipedia can't make up its mind about the capital of Kiribati.

axelboc commented 3 years ago

I completely forgot that the Norwegian deck doesn't use Wikipedia as its first source, so its capitals are correct as per the current guidelines.

As discussed in #417, we should consider updating the Translation sources for Norwegian so that the site of the Ministry of Foreign Affairs of Norway, which has last been updated in 2013, is no longer the main source. This will help remove some of the inconsistencies.

~I'm no longer convinced that the guidelines need to be amended any further.~

However, we should make sure that we follow the guidelines correctly by removing alternative names/spellings from the Capital field and moving them to the Capital info field:

axelboc commented 3 years ago

I'm no longer convinced that the guidelines need to be amended any further.

On second thought (sorry 😅)... Looking back at the cases of Palestine and Kiribati, especially, it's clear that our guidelines still have some limitations.

Perhaps we should reconsider our policy of following each localised Wikipedia (or translation sources) for capitals, and instead take the capitals from English Wikipedia and just translate them... like we do for countries and flags basically 😄

I can't find the original discussion that led us (me?) to choose this policy (#210 and #255 are as far as I could get), but now that we have so many translations, keeping each deck in sync with its corresponding localised Wikipedia is clearly impractical ... and it's only going to get worse the more languages we get!

If we decide to use English Wikipedia as the source of truth, and a capital keeps changing back-and-forth between two names there, then we only have to resolve this volatility in one place, by finding better sources and discussing the matter on the country's Talk page.

Of course, things such as spelling, alternative names, etc. would remain sourced from each localised Wikipedia (or translation sources) independently.

Making this change to the guidelines would resolve all capital inconsistencies across languages, period:

aplaice commented 3 years ago

from the other issue:

and I can't actually find any case where it led to a factually worthwhile inconsistency...

In principle, conflicts about the actual capital could have been interesting, say due to differences in the definition of a capital in different languages, but I think you're completely right that in practice there haven't been any such cases!

Aside on Montenegro's capitals For instance, it's vaguely interesting that Spanish Wikipedia had decided that both Cetinje and Podgorica were capitals of Montenegro (rather than one being an honorary capital and the other an actual one), but I don't think it's indicative of any deeper differences in how the Spanish language or Spanish people think of capitals or cities, but was more likely just an accident of editing. (The fact that the [Serbo-Croatian Wiki](https://sh.wikipedia.org/wiki/Crna_Gora) describes Cetinje as the "throne capital" _might_ be, but even there I'm not sure (e.g. [Serbian Wikipedia](https://sr.wikipedia.org/wiki/%D0%A6%D1%80%D0%BD%D0%B0_%D0%93%D0%BE%D1%80%D0%B0) just lists Podgorica as the capital). It might _perhaps_ be worthwhile to leave a loophole for the capitals of countries, in that country's native language(s), in case people "on the ground" have very strong opinions about precise details, in such a situation. I don't think we have any such situations atm though, and we can worry about them when/if we do.)

I fully agree with your points about the different possible scenarios — the "real" capital (or the best guess of what the "real" capital is) should be the same irrespective of language.


Of course, things such as spelling, alternative names, etc. would remain sourced from each localised Wikipedia (or translation sources) independently.

Yeah, definitely! There are many cases where another language has multiple names for an entity, but English only has one and vice-versa. (Hence, some care will still be needed with the Country/Capital infos and with choosing the correct name/spelling as the "main" version, but at least we won't have situations where different languages list different "entities" (or even different numbers of "entities") in the capital field...)


I can't find the original discussion that led us (me?) to choose this policy (#210 and #255 are as far as I could get),

I'm not sure either. It seemed like a good idea at the time (I was totally in favour!), allowing us to be "language-neutral", but with hindsight it was a large amount of effort for effectively no gain, and with so many languages it's untenable, as you wrote.