Open AleGoiko opened 6 years ago
do you mean more countries that change or format wise? if its the latter the format looked the same to me, if its the former, not that I saw but I may be missing something.
I was thinking if moving from use adm0_a3 to iso_a3 as the principal data for countries changes anything more apart from South Sudan (more discrepancies between iso and the american format)
other than running test.sh is there any other way to test this?
I'd compare the two tables to be completely sure
so i tweaked build_synonym_table.sql
a little bit more and the result is as follows
(being admin0_synonyms_orig
the original admin0_synonyms
, and admin0_synonyms
the new table and both tables containing only the inserts from ne_admin0_v3
):
dataservices_db=> SELECT adm0_a3, name, rank FROM admin0_synonyms_orig EXCEPT SELECT adm0_a3, name, rank from admin0_synonyms;
adm0_a3 | name | rank
---------+----------------------------------+------
SDS | SDS | 3
CCK | CCK | 3
BVT | BVT | 3
BES | BES | 3
PSX | West Bank and Gaza | 5
ALD | ALD | 3
SDS | Republic of South Sudan | 5
SAH | Western Sahara | 9
REU | REU | 3
SAH | W. Sah. | 8
SDS | S. Sud. | 8
GUF | GUF | 3
PSX | PSX | 3
GLP | GLP | 3
MTQ | MTQ | 3
FRH | FRH | 3
TKL | TKL | 3
SAH | ESH | 0
SDS | SSD | 0
SAH | SAH | 3
SJM | SJM | 3
PSX | PSE | 0
MYT | MYT | 3
ALD | Åland Islands | 5
CXR | CXR | 3
ALD | ALA | 0
SAH | Sahrawi Arab Democratic Republic | 5
SDS | South Sudan | 9
the entries that are missing seem outdated or redundant anyway.
I was having a look at other files and I am not sure if we need to change them since the old codes are still around, only that with a lower ranking
Ok, looks good. Check why the tests are failing and we can move on
restest this please
@ethervoid looks like it was a glitch :sweat_smile:
Great that fixes the underlying problem, we need to create a patch for the current data, as you can see here in order to avoid recreate all the adm0 data. What do you think?
ok so, here's where I stand right now with this:
I created the following tables:
admin0_synonyms_iso
: using iso3 codes (admin0_synonyms
but with iso codes)admin0_synonyms_local
: (made using the regular build_synonym_table.sql
)admin0_synonyms
: loaded from a regular db_dumpthe difference between admin0_synonyms
and admin0_synonyms_iso
is as follows(remove the .zip
extension):
SELECT adm0_a3, name, rank FROM admin0_synonyms EXCEPT SELECT adm0_a3, name, rank from admin0_synonyms_iso;
It is mostly outdated or repetitive data except for lines such as:
ESP,Sp.,8
which seem to be missing from the new admin0_synonyms_iso
table.
However, upon closer inspection of the build_synonym_table.sql
this explains why it is missing:
INSERT INTO admin0_synonyms (name, rank, adm0_a3)
SELECT
abbrev, 8, adm0_a3
FROM
ne_admin0_v3
WHERE
char_length(regexp_replace(abbrev, '[^a-zA-Z\u00C0-\u00ff]+', '', 'g')) > 3;
for some reason the admin0_synonyms
in the db_dump files does not seem to incluide this query, however the adm0_a3_local
does include it as it follows build_synonym_table.sql
. In this case the difference (between the iso table and the regular one created locally) the diff is as follows:
SELECT adm0_a3, name, rank FROM admin0_synonyms_local EXCEPT SELECT adm0_a3, name, rank from admin0_synonyms_iso;
adm0_a3 | name | rank
---------+-----------------------------------+------
CCK | CCK | 3
CCK | Cocos (Keeling) Islands | 0
BVT | BVT | 3
REU | Reunion | 0
MTQ | Martinique | 0
SAH | Western Sahara | 9
SDS | S. Sudan | 0
SDS | S. Sud. | 8
GUF | GUF | 3
SJM | Svalbard and Jan Mayen | 0
MTQ | MTQ | 3
FRH | FRH | 3
ALD | Aland | 0
SAH | ESH | 0
FRH | Corse | 0
CXR | CXR | 3
SAH | Sahrawi Arab Democratic Republic | 5
GLP | Guadeloupe | 0
SDS | South Sudan | 9
SDS | SDS | 3
BES | BES | 3
PSX | West Bank and Gaza | 5
ALD | ALD | 3
SDS | Republic of South Sudan | 5
REU | REU | 3
SAH | W. Sah. | 8
BES | Bonaire (Sint Eustatius and Saba) | 0
TKL | Tokelau | 0
PSX | PSX | 3
GLP | GLP | 3
GUF | French Guiane | 0
MYT | Mayotte | 0
CXR | Christmas Island | 0
BVT | Bouvet Island | 0
TKL | TKL | 3
SDS | SSD | 0
SAH | SAH | 3
SJM | SJM | 3
PSX | PSE | 0
MYT | MYT | 3
ALD | Åland Islands | 5
ALD | ALA | 0
PSX | Palestine | 0
SAH | W. Sahara | 0
and it makes much more sense.
I am very lost at this point, not sure if the sql generated in admin0_synonyms_iso
is a suitable candidate to replace the current admin0_synonyms
in the dumps :(
What does Rank mean? what is the main use of the rank column?
tries to fix https://github.com/CartoDB/data-services/issues/233
Any feedback is welcome, as I am not sure this is the best approach.