Closed boffi closed 4 years ago
I have problems parsing wta_players.csv with Python3 csv.reader, that tries to decode the binary data using UTF8.
wta_players.csv
csv.reader
The offending records, found searching for non-ascii characters in an editor, are the following
212305,Joselyn Margarita,Treyes Albarrac纃N,,19970629,ECU 215238,Selin G羮Lseren,Simsek,U,19990509,TUR 221676,Ludmila Magal罸,Alvez,R,20011129,ARG
It seems to me that they should be Albarracín and Magalí and possibly Gülseren but the WTA site reports only of Selin Simsek w/o a middle name:
Albarracín
Magalí
Gülseren
212305,Joselyn Margarita,Treyes Albarracín,,19970629,ECU 215238,Selin Gülseren,Simsek,U,19990509,TUR 221676,Ludmila Magalí,Alvez,R,20011129,ARG
(when I correct the file as above I can parse the data with Python3's csv.reader).
On the other hand it looks like the rest of the data is strictly ascii, so maybe it should be
212305,Joselyn Margarita,Treyes Albarracin,,19970629,ECU 215238,Selin Gulseren,Simsek,U,19990509,TUR 221676,Ludmila Magali,Alvez,R,20011129,ARG
Regards ፨ gb
thanks, now fixed. I must have changed the non-ascii chars of the names mentioned in the issue at some point in the past, but I seem to have changed them to the wrong ascii chars. Better now.
I have problems parsing
wta_players.csv
with Python3csv.reader
, that tries to decode the binary data using UTF8.The offending records, found searching for non-ascii characters in an editor, are the following
It seems to me that they should be
Albarracín
andMagalí
and possiblyGülseren
but the WTA site reports only of Selin Simsek w/o a middle name:(when I correct the file as above I can parse the data with Python3's
csv.reader
).On the other hand it looks like the rest of the data is strictly ascii, so maybe it should be
Regards ፨ gb