datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io
https://datahub.io/
32 stars 6 forks source link

Encoding issues with `airport-codes_csv.csv` #294

Closed yelizariev closed 7 months ago

yelizariev commented 8 months ago

Describe the issue

How to reproduce

  1. Download airport codes file from the page https://datahub.io/core/airport-codes

wget https://datahub.io/core/airport-codes/r/airport-codes.csv

  1. Check EPAR airport
grep EPAR, airport-codes_csv.csv
EPAR,small_airport,ArÅamów Airport,1455,EU,PL,PL-PK,Bircza,EPAR,,,"22.514298, 49.657501"

Expected behavior

The name must be Arłamów Airfield.

I also tried to convert from different encodings, but without success

grep EPAR, airport-codes_csv.csv  | iconv -f Windows-1252 -t UTF-8
EPAR,small_airport,Arłamów Airport,1455,EU,PL,PL-PK,Bircza,EPAR,,,"22.514298, 49.657501"
rufuspollock commented 7 months ago

@yelizariev thank-you for reporting 🙏

this should be fixed upstream in the source dataset here https://github.com/datasets/airport-codes - would you like to report there and/or submit a fix - PRs are welcome. 🙂

rufuspollock commented 7 months ago

INVALID / DUPLICATE. Think this is a duplicate of https://github.com/datasets/airport-codes/issues/37

Plus should report this in https://github.com/datasets/airport-codes