Remove zero-width chars from key, before matches

egregors commented 1 year ago

As a last way to compare keys, remove all zero-width characters from the key.

Solved #242

pikanezi commented 1 year ago

Isn't it normal that "\u200Bdate" does not match "date"? I'm not sure if it is up to gocsv to normalize your data.

Can't you fix your data before using gocsv?

egregors commented 1 year ago

Isn't it normal that "\u200Bdate" does not match "date"? I'm not sure if it is up to gocsv to normalize your data.

But for some reason, you're already doing it. I didn't see a big difference between strings.TrimSpace(key) == k which already in matchesKey method and cleaning zw-chars. Why “\u200Bdate” does not match “date” is normal, but “ date“ does not match “date” isn't?

A user-story here is pretty clear. Let's say I'm a consumer of the lib. And I'd like to parse some csv. So, I open a csv file and see titles row: data;a;b;c;;d which contains some \u200B. Obviously, I don't see any zero-width chars in titles, and write csv annotation as I see it in the raw doc.

But, after Unmarshal call, I am getting invalid result. Some cols are not parsed.

For me, it looks, like libs responsibility to maintain expected behavior. As far as you're already doing some normalization in fieldInfo.matchesKey method.

pikanezi commented 1 year ago

Good point, thanks

gocarina / gocsv

Remove zero-width chars from key, before matches #243