The invalid UTF8 sequence is here : e3 93 4e
But I think it's in reallity e3 93 - Because 4e is the correct "N" at the end of the name (don't know why it's uppercase)
The name contain an o with accent that match with c3 93 (in uppercase)
I suppose the problem comes from multiple encoding operation/conversion with wrong encoding choice (maybe e3 comes from c3 after transform to lower case in latin1 so the original lower('Ó') do lower(c3) lower(93) => e3 93 instead of lower(c3 93) => ó)
There is many other invalid seuence of utf8 in this file
Some char of the wta_players.csv are not UTF8 char.
222342,Manuela,Zegarra Ball�N,U,,PER
The real player seems to be https://www.wtatennis.com/players/329659/manuela-zegarra-ball-n => Manuela Zegarra-BallónThe invalid UTF8 sequence is here : e3 93 4e But I think it's in reallity e3 93 - Because 4e is the correct "N" at the end of the name (don't know why it's uppercase) The name contain an o with accent that match with c3 93 (in uppercase)
I suppose the problem comes from multiple encoding operation/conversion with wrong encoding choice (maybe e3 comes from c3 after transform to lower case in latin1 so the original lower('Ó') do lower(c3) lower(93) => e3 93 instead of lower(c3 93) => ó)
There is many other invalid seuence of utf8 in this file