datasets / un-locode

United Nations Codes for Trade and Transport Locations (UN/LOCODE) and Country Codes
https://datahub.io/core/un-locode
146 stars 56 forks source link

Some data are switched in code list csv #37

Open dwaam opened 1 week ago

dwaam commented 1 week ago

Hi,

I wrote an issue previously: https://github.com/datasets/un-locode/issues/34

In fact, the duplicated headers are gone, some unlocodes are fixed, but there are still issues:

exemples:

// Line 6909 - 6911
,GR,JSY,Syra (Syros),Syra (Syros),,AI,1-------,9601,,,
,GR,SYO,Syra Island,Syra Island,,RL,--3-----,0201,,3726N 02455E,
,GR,JSY,Syros (Syra),Syros (Syra),,AI,1-------,9601,,,
// Line 122939 - 122941
,GR,JSY,Syra (Syros),Syra (Syros),,AI,1-------,9601,,,
,GR,SYO,Syra Island,Syra Island,,RL,--3-----,0201,,3726N 02455E,
,GR,JSY,Syros (Syra),Syros (Syra),,AI,1-------,9601,,,

// Here, both status are empty, but are switched with function in second entry
// Line 107845
,FR,LR7,L'Aiguillon-la-Rouge,L'Aiguillon-la-Rouge,61,-----6--,,2101,,4816N 00042E,
// Line 223874 
,FR,LR7,L'Aiguillon-la-Rouge,L'Aiguillon-la-Rouge,61,,-----6--,2101,,4816N 00042E,

// Same here with not empty status
// Line 37130
,SV,PDC,Paso de la Ceiba,Paso de la Ceiba,SA,RQ,--3----B,0901,,1425N 08926W,
// Line 153159
,SV,PDC,Paso de la Ceiba,Paso de la Ceiba,SA,--3----B,RQ,0901,,1425N 08926W,

Is it possible to fix the data? Thanks !

sabas commented 1 week ago

@gradedSystem

gradedSystem commented 6 days ago

@sabas @dwaam looking through it right now

gradedSystem commented 6 days ago

@sabas @dwaam I fixed down the dataset removing duplicate rows and enhancing the swap function between Function and Status columns, if any other issue happens just tag me will fix it anytime

dwaam commented 5 days ago

Thanks ! I see that the Data in the UN seems not correct: https://service.unece.org/trade/locode/gr.htm

image

The same UNLOCODE has two entries, and both are validated.

Do you think the data is correct ? If not, do you know if there is a way to ask them to fix it?

gradedSystem commented 5 days ago
  1. Great observations @dwaam I am not sure if the data is correct cause from the issues I have seen people were complaining about the source, but you can refer to this issue which is old, but I think addresses what are you pointing out : https://github.com/datasets/un-locode/issues/17
  2. I am not sure, I wrote the email to the un.org website and pointed your issue there gonna wait an email from them and reply as soon as I get reply šŸ«”
dwaam commented 5 days ago

Cool thanks @gradedSystem ;)

sabas commented 5 days ago

@dwaam @gradedSystem that's normal, note that the two entries are in the form X (Y) and Y (X), indeed the same as Ģ€#17 Probably the "fix" would be to make it an alias entry (see CHGVA for an example), but who takes the responsibility to choose which is the primary entry? Can you generate a list of duplicates? I can ask at the next meeting but I think if's a "feature" more than a bug

dwaam commented 5 days ago

Not really, because our unlocode will be our natural id and the key. In all the unlocodes, it is the only one that I see like that, so it seems to be a mistake in the UN. If at least they had different status to be able to choose the right one, but not.

dwaam commented 5 days ago

Hi, It seems good with the correction, the data in the pull request seems ok, thanks ;)

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ count ā”‚ Country ā”‚ Location ā”‚
ā”‚ int64 ā”‚ varchar ā”‚ varchar  ā”‚
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
ā”‚     4 ā”‚ US      ā”‚ TRI      ā”‚
ā”‚     3 ā”‚ US      ā”‚ BGM      ā”‚
ā”‚     3 ā”‚ US      ā”‚ LEB      ā”‚
ā”‚     3 ā”‚ US      ā”‚ GGG      ā”‚
ā”‚     3 ā”‚ US      ā”‚ MBS      ā”‚
ā”‚     3 ā”‚ US      ā”‚ PHF      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ GRA      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ FEL      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ SZN      ā”‚
ā”‚     2 ā”‚ KH      ā”‚ PPT      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ KET      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ VRA      ā”‚
ā”‚     2 ā”‚ TR      ā”‚ MGL      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ SPI      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ PKR      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ 9YI      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ EJO      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ HMN      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KIM      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ MAX      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ MIK      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ RAU      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TER      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ LEV      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ UJR      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ PDT      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ ZKL      ā”‚
ā”‚     2 ā”‚ US      ā”‚ CVO      ā”‚
ā”‚     2 ā”‚ US      ā”‚ GON      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ ODE      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ DLS      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ HKO      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KAA      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ PAR      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ MET      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ TEY      ā”‚
ā”‚     2 ā”‚ MG      ā”‚ IVA      ā”‚
ā”‚     2 ā”‚ MT      ā”‚ SJN      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ BNI      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ MES      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ R4B      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ VEP      ā”‚
ā”‚     2 ā”‚ SO      ā”‚ DOW      ā”‚
ā”‚     2 ā”‚ TR      ā”‚ OPR      ā”‚
ā”‚     2 ā”‚ US      ā”‚ HIB      ā”‚
ā”‚     2 ā”‚ US      ā”‚ GSP      ā”‚
ā”‚     2 ā”‚ US      ā”‚ POY      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ LNY      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ SJN      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ SLW      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ SPO      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ PRV      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KOK      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ LAP      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ UKI      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ RYM      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ BLA      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ VES      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ HAF      ā”‚
ā”‚     2 ā”‚ IN      ā”‚ MRM      ā”‚
ā”‚     2 ā”‚ IT      ā”‚ PFX      ā”‚
ā”‚     2 ā”‚ LV      ā”‚ SKR      ā”‚
ā”‚     2 ā”‚ US      ā”‚ MDJ      ā”‚
ā”‚     2 ā”‚ VN      ā”‚ VAG      ā”‚
ā”‚     2 ā”‚ AX      ā”‚ MHQ      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ ESE      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ MUV      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ SIB      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ SVR      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ TEC      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ TZK      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ HOU      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KAJ      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ NLI      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TOR      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ CAK      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ BOY      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ MRC      ā”‚
ā”‚     2 ā”‚ TR      ā”‚ MKP      ā”‚
ā”‚     2 ā”‚ US      ā”‚ RDM      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ WBV      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ BOO      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ BVZ      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ SKV      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KOR      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ MHQ      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ NRP      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ SIP      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ VTH      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ VCS      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ ZZB      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ OTN      ā”‚
ā”‚     2 ā”‚ IN      ā”‚ NSA      ā”‚
ā”‚     2 ā”‚ MT      ā”‚ SGW      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ GWM      ā”‚
ā”‚     2 ā”‚ US      ā”‚ LEW      ā”‚
ā”‚     2 ā”‚ US      ā”‚ BWI      ā”‚
ā”‚     2 ā”‚ US      ā”‚ PTN      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ PRY      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ ESP      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KJA      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ LPP      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ LAV      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ SDA      ā”‚
ā”‚     2 ā”‚ IT      ā”‚ FCO      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ SLA      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ STJ      ā”‚
ā”‚     2 ā”‚ RU      ā”‚ YEK      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ HOK      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ VUE      ā”‚
ā”‚     2 ā”‚ SN      ā”‚ TOU      ā”‚
ā”‚     2 ā”‚ TR      ā”‚ IZM      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ VOS      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ MOS      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ SGI      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ JNR      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TKU      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KRK      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ PRS      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ NAU      ā”‚
ā”‚     2 ā”‚ LT      ā”‚ DID      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ PEL      ā”‚
ā”‚     2 ā”‚ US      ā”‚ MRY      ā”‚
ā”‚     2 ā”‚ US      ā”‚ OXR      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ KTA      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ MAE      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ OST      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KIN      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ INK      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ LHI      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ SVL      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ POH      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TMP      ā”‚
ā”‚     2 ā”‚ LT      ā”‚ KEL      ā”‚
ā”‚     2 ā”‚ LU      ā”‚ SKK      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ BRU      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KAL      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KVH      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ MLX      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ UKP      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TOK      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ HER      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ GSP      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ OTO      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ LOV      ā”‚
ā”‚     2 ā”‚ LV      ā”‚ BRC      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ BED      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ BEL      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ VLN      ā”‚
ā”‚     2 ā”‚ US      ā”‚ HTS      ā”‚
ā”‚     2 ā”‚ DE      ā”‚ LAA      ā”‚
ā”‚     2 ā”‚ ES      ā”‚ LDT      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ IIS      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ VAT      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ SYS      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ JSY      ā”‚
ā”‚     2 ā”‚ LT      ā”‚ MOS      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ MIK      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ BTS      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ SBK      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ UKE      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ RAA      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KAS      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ PER      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ SBG      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ CEL      ā”‚
ā”‚     2 ā”‚ US      ā”‚ GSO      ā”‚
ā”‚     2 ā”‚ US      ā”‚ SUN      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ KAN      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ CYD      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ PIR      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TAI      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KUS      ā”‚
ā”‚     2 ā”‚ HR      ā”‚ KAS      ā”‚
ā”‚     2 ā”‚ LT      ā”‚ AGM      ā”‚
ā”‚     2 ā”‚ RO      ā”‚ RGU      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ TOR      ā”‚
ā”‚     2 ā”‚ SN      ā”‚ DUR      ā”‚
ā”‚     2 ā”‚ US      ā”‚ SRQ      ā”‚
ā”‚     2 ā”‚ US      ā”‚ BYI      ā”‚
ā”‚     2 ā”‚ US      ā”‚ RDU      ā”‚
ā”‚     2 ā”‚ US      ā”‚ MSL      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ OST      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ BAT      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ UCN      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ ENF      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ HEL      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KEM      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ VAA      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ KIM      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ FZS      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ MZK      ā”‚
ā”‚     2 ā”‚ LT      ā”‚ DEW      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ KMS      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ ITR      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ CVA      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ PEV      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ HYV      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ JPA      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ LHJ      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ LOV      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ MER      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ HAS      ā”‚
ā”‚     2 ā”‚ HU      ā”‚ KOZ      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ DOL      ā”‚
ā”‚     2 ā”‚ PL      ā”‚ WLR      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ VOC      ā”‚
ā”‚     2 ā”‚ TR      ā”‚ SRS      ā”‚
ā”‚     2 ā”‚ US      ā”‚ EWB      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ MSJ      ā”‚
ā”‚     2 ā”‚ CZ      ā”‚ KAD      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ JVP      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ RUO      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ VKO      ā”‚
ā”‚     2 ā”‚ GR      ā”‚ HYD      ā”‚
ā”‚     2 ā”‚ JP      ā”‚ AGC      ā”‚
ā”‚     2 ā”‚ LT      ā”‚ EMK      ā”‚
ā”‚     2 ā”‚ LV      ā”‚ MPS      ā”‚
ā”‚     2 ā”‚ MD      ā”‚ VUL      ā”‚
ā”‚     2 ā”‚ RO      ā”‚ DIM      ā”‚
ā”‚     2 ā”‚ SK      ā”‚ VNV      ā”‚
ā”‚     2 ā”‚ US      ā”‚ MPV      ā”‚
ā”‚     2 ā”‚ US      ā”‚ PSB      ā”‚
ā”‚     2 ā”‚ US      ā”‚ FHU      ā”‚
ā”‚     2 ā”‚ US      ā”‚ MFE      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ TRN      ā”‚
ā”‚     2 ā”‚ BE      ā”‚ ZUN      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ POR      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ HMY      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KER      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ KRS      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ OUL      ā”‚
ā”‚     2 ā”‚ FI      ā”‚ TVS      ā”‚

Those are the remaining duplicates, but it seems to be an issue directly in UN data.

sabas commented 5 days ago

Yeah I suggest to treat that as aliases...

gradedSystem commented 1 day ago

Hi @sabas @dwaam sorry for long time not replying I was ill, but now this time I am ready to work on this again, so as far as I understood we should keep the aliases right?

dwaam commented 1 day ago

Hi, the fact that there are duplicated, I handled it directly. So no need for me now ;), but it's weird that the UN authorise duplicated entries.

gradedSystem commented 22 hours ago

@dwaam so the PR is correct right?

dwaam commented 21 hours ago

Yep, for the duplication it's good, I took care of it on my side. I'll continue to watch your repo if you change the behavior, but if in the future, only one line per unlocode is in the CSV, it is even better for me :)