Open dwaam opened 1 week ago
@gradedSystem
@sabas @dwaam looking through it right now
@sabas @dwaam
I fixed down the dataset removing duplicate rows and enhancing the swap function between Function
and Status
columns, if any other issue happens just tag me will fix it anytime
Thanks ! I see that the Data in the UN seems not correct: https://service.unece.org/trade/locode/gr.htm
The same UNLOCODE has two entries, and both are validated.
Do you think the data is correct ? If not, do you know if there is a way to ask them to fix it?
Cool thanks @gradedSystem ;)
@dwaam @gradedSystem that's normal, note that the two entries are in the form X (Y) and Y (X), indeed the same as Ģ#17 Probably the "fix" would be to make it an alias entry (see CHGVA for an example), but who takes the responsibility to choose which is the primary entry? Can you generate a list of duplicates? I can ask at the next meeting but I think if's a "feature" more than a bug
Not really, because our unlocode will be our natural id and the key. In all the unlocodes, it is the only one that I see like that, so it seems to be a mistake in the UN. If at least they had different status to be able to choose the right one, but not.
Hi, It seems good with the correction, the data in the pull request seems ok, thanks ;)
āāāāāāāāā¬āāāāāāāāāā¬āāāāāāāāāāā
ā count ā Country ā Location ā
ā int64 ā varchar ā varchar ā
āāāāāāāāā¼āāāāāāāāāā¼āāāāāāāāāāā¤
ā 4 ā US ā TRI ā
ā 3 ā US ā BGM ā
ā 3 ā US ā LEB ā
ā 3 ā US ā GGG ā
ā 3 ā US ā MBS ā
ā 3 ā US ā PHF ā
ā 2 ā HR ā GRA ā
ā 2 ā HU ā FEL ā
ā 2 ā HU ā SZN ā
ā 2 ā KH ā PPT ā
ā 2 ā PL ā KET ā
ā 2 ā SK ā VRA ā
ā 2 ā TR ā MGL ā
ā 2 ā BE ā SPI ā
ā 2 ā CZ ā PKR ā
ā 2 ā CZ ā 9YI ā
ā 2 ā FI ā EJO ā
ā 2 ā FI ā HMN ā
ā 2 ā FI ā KIM ā
ā 2 ā FI ā MAX ā
ā 2 ā FI ā MIK ā
ā 2 ā FI ā RAU ā
ā 2 ā FI ā TER ā
ā 2 ā GR ā LEV ā
ā 2 ā HU ā UJR ā
ā 2 ā SK ā PDT ā
ā 2 ā SK ā ZKL ā
ā 2 ā US ā CVO ā
ā 2 ā US ā GON ā
ā 2 ā BE ā ODE ā
ā 2 ā FI ā DLS ā
ā 2 ā FI ā HKO ā
ā 2 ā FI ā KAA ā
ā 2 ā FI ā PAR ā
ā 2 ā HR ā MET ā
ā 2 ā HU ā TEY ā
ā 2 ā MG ā IVA ā
ā 2 ā MT ā SJN ā
ā 2 ā SK ā BNI ā
ā 2 ā SK ā MES ā
ā 2 ā SK ā R4B ā
ā 2 ā SK ā VEP ā
ā 2 ā SO ā DOW ā
ā 2 ā TR ā OPR ā
ā 2 ā US ā HIB ā
ā 2 ā US ā GSP ā
ā 2 ā US ā POY ā
ā 2 ā BE ā LNY ā
ā 2 ā BE ā SJN ā
ā 2 ā BE ā SLW ā
ā 2 ā BE ā SPO ā
ā 2 ā FI ā PRV ā
ā 2 ā FI ā KOK ā
ā 2 ā FI ā LAP ā
ā 2 ā FI ā UKI ā
ā 2 ā FI ā RYM ā
ā 2 ā HU ā BLA ā
ā 2 ā HU ā VES ā
ā 2 ā HU ā HAF ā
ā 2 ā IN ā MRM ā
ā 2 ā IT ā PFX ā
ā 2 ā LV ā SKR ā
ā 2 ā US ā MDJ ā
ā 2 ā VN ā VAG ā
ā 2 ā AX ā MHQ ā
ā 2 ā BE ā ESE ā
ā 2 ā CZ ā MUV ā
ā 2 ā CZ ā SIB ā
ā 2 ā CZ ā SVR ā
ā 2 ā CZ ā TEC ā
ā 2 ā CZ ā TZK ā
ā 2 ā FI ā HOU ā
ā 2 ā FI ā KAJ ā
ā 2 ā FI ā NLI ā
ā 2 ā FI ā TOR ā
ā 2 ā HR ā CAK ā
ā 2 ā HU ā BOY ā
ā 2 ā PL ā MRC ā
ā 2 ā TR ā MKP ā
ā 2 ā US ā RDM ā
ā 2 ā BE ā WBV ā
ā 2 ā CZ ā BOO ā
ā 2 ā CZ ā BVZ ā
ā 2 ā FI ā SKV ā
ā 2 ā FI ā KOR ā
ā 2 ā FI ā MHQ ā
ā 2 ā FI ā NRP ā
ā 2 ā FI ā SIP ā
ā 2 ā GR ā VTH ā
ā 2 ā HU ā VCS ā
ā 2 ā HU ā ZZB ā
ā 2 ā HU ā OTN ā
ā 2 ā IN ā NSA ā
ā 2 ā MT ā SGW ā
ā 2 ā PL ā GWM ā
ā 2 ā US ā LEW ā
ā 2 ā US ā BWI ā
ā 2 ā US ā PTN ā
ā 2 ā CZ ā PRY ā
ā 2 ā FI ā ESP ā
ā 2 ā FI ā KJA ā
ā 2 ā FI ā LPP ā
ā 2 ā GR ā LAV ā
ā 2 ā HR ā SDA ā
ā 2 ā IT ā FCO ā
ā 2 ā PL ā SLA ā
ā 2 ā PL ā STJ ā
ā 2 ā RU ā YEK ā
ā 2 ā SK ā HOK ā
ā 2 ā SK ā VUE ā
ā 2 ā SN ā TOU ā
ā 2 ā TR ā IZM ā
ā 2 ā BE ā VOS ā
ā 2 ā BE ā MOS ā
ā 2 ā BE ā SGI ā
ā 2 ā CZ ā JNR ā
ā 2 ā FI ā TKU ā
ā 2 ā FI ā KRK ā
ā 2 ā FI ā PRS ā
ā 2 ā FI ā NAU ā
ā 2 ā LT ā DID ā
ā 2 ā SK ā PEL ā
ā 2 ā US ā MRY ā
ā 2 ā US ā OXR ā
ā 2 ā CZ ā KTA ā
ā 2 ā CZ ā MAE ā
ā 2 ā CZ ā OST ā
ā 2 ā FI ā KIN ā
ā 2 ā FI ā INK ā
ā 2 ā FI ā LHI ā
ā 2 ā FI ā SVL ā
ā 2 ā FI ā POH ā
ā 2 ā FI ā TMP ā
ā 2 ā LT ā KEL ā
ā 2 ā LU ā SKK ā
ā 2 ā BE ā BRU ā
ā 2 ā FI ā KAL ā
ā 2 ā FI ā KVH ā
ā 2 ā FI ā MLX ā
ā 2 ā FI ā UKP ā
ā 2 ā FI ā TOK ā
ā 2 ā GR ā HER ā
ā 2 ā HR ā GSP ā
ā 2 ā HR ā OTO ā
ā 2 ā HU ā LOV ā
ā 2 ā LV ā BRC ā
ā 2 ā PL ā BED ā
ā 2 ā PL ā BEL ā
ā 2 ā SK ā VLN ā
ā 2 ā US ā HTS ā
ā 2 ā DE ā LAA ā
ā 2 ā ES ā LDT ā
ā 2 ā FI ā IIS ā
ā 2 ā FI ā VAT ā
ā 2 ā GR ā SYS ā
ā 2 ā GR ā JSY ā
ā 2 ā LT ā MOS ā
ā 2 ā PL ā MIK ā
ā 2 ā BE ā BTS ā
ā 2 ā BE ā SBK ā
ā 2 ā BE ā UKE ā
ā 2 ā FI ā RAA ā
ā 2 ā FI ā KAS ā
ā 2 ā FI ā PER ā
ā 2 ā FI ā SBG ā
ā 2 ā SK ā CEL ā
ā 2 ā US ā GSO ā
ā 2 ā US ā SUN ā
ā 2 ā BE ā KAN ā
ā 2 ā CZ ā CYD ā
ā 2 ā FI ā PIR ā
ā 2 ā FI ā TAI ā
ā 2 ā FI ā KUS ā
ā 2 ā HR ā KAS ā
ā 2 ā LT ā AGM ā
ā 2 ā RO ā RGU ā
ā 2 ā SK ā TOR ā
ā 2 ā SN ā DUR ā
ā 2 ā US ā SRQ ā
ā 2 ā US ā BYI ā
ā 2 ā US ā RDU ā
ā 2 ā US ā MSL ā
ā 2 ā BE ā OST ā
ā 2 ā CZ ā BAT ā
ā 2 ā CZ ā UCN ā
ā 2 ā FI ā ENF ā
ā 2 ā FI ā HEL ā
ā 2 ā FI ā KEM ā
ā 2 ā FI ā VAA ā
ā 2 ā GR ā KIM ā
ā 2 ā HU ā FZS ā
ā 2 ā HU ā MZK ā
ā 2 ā LT ā DEW ā
ā 2 ā PL ā KMS ā
ā 2 ā BE ā ITR ā
ā 2 ā CZ ā CVA ā
ā 2 ā CZ ā PEV ā
ā 2 ā FI ā HYV ā
ā 2 ā FI ā JPA ā
ā 2 ā FI ā LHJ ā
ā 2 ā FI ā LOV ā
ā 2 ā FI ā MER ā
ā 2 ā HU ā HAS ā
ā 2 ā HU ā KOZ ā
ā 2 ā PL ā DOL ā
ā 2 ā PL ā WLR ā
ā 2 ā SK ā VOC ā
ā 2 ā TR ā SRS ā
ā 2 ā US ā EWB ā
ā 2 ā BE ā MSJ ā
ā 2 ā CZ ā KAD ā
ā 2 ā FI ā JVP ā
ā 2 ā FI ā RUO ā
ā 2 ā FI ā VKO ā
ā 2 ā GR ā HYD ā
ā 2 ā JP ā AGC ā
ā 2 ā LT ā EMK ā
ā 2 ā LV ā MPS ā
ā 2 ā MD ā VUL ā
ā 2 ā RO ā DIM ā
ā 2 ā SK ā VNV ā
ā 2 ā US ā MPV ā
ā 2 ā US ā PSB ā
ā 2 ā US ā FHU ā
ā 2 ā US ā MFE ā
ā 2 ā BE ā TRN ā
ā 2 ā BE ā ZUN ā
ā 2 ā FI ā POR ā
ā 2 ā FI ā HMY ā
ā 2 ā FI ā KER ā
ā 2 ā FI ā KRS ā
ā 2 ā FI ā OUL ā
ā 2 ā FI ā TVS ā
Those are the remaining duplicates, but it seems to be an issue directly in UN data.
Yeah I suggest to treat that as aliases...
Hi @sabas @dwaam sorry for long time not replying I was ill, but now this time I am ready to work on this again, so as far as I understood we should keep the aliases right?
Hi, the fact that there are duplicated, I handled it directly. So no need for me now ;), but it's weird that the UN authorise duplicated entries.
@dwaam so the PR is correct right?
Yep, for the duplication it's good, I took care of it on my side. I'll continue to watch your repo if you change the behavior, but if in the future, only one line per unlocode is in the CSV, it is even better for me :)
Hi,
I wrote an issue previously: https://github.com/datasets/un-locode/issues/34
In fact, the duplicated headers are gone, some unlocodes are fixed, but there are still issues:
exemples:
Is it possible to fix the data? Thanks !