Data-Liberation-Front / csvlint.io

Check that your CSV files are valid
http://csvlint.io
MIT License
73 stars 12 forks source link

Recommandation of the "," as a separator, an issue for some cultures? #171

Open CharlesNepote opened 9 years ago

CharlesNepote commented 9 years ago

First of all, thank you for the great job!

In the "about" page -- http://csvlint.io/about -- you recommand that "column names and fields are separated by commas". In some european cultures (at least in french, italian and spanish cultures) the comma is used to separate the numbers with their decimal part. In the UK you write 23.78% while in France, for exemple, we generally write 23,78%. On my machine (locale FR-fr), LibreOffice for Linux interpret 0,5 as a number and 0.5 as a text chain.

Honnestly I don't know how different Spreadsheet Software deal with this, but there is a risk that some of them convert "23,78%" as a text chain rather than a number. For that reason I'm asking myself if the semi-comma is not a better recommandation?

Floppy commented 9 years ago

In those cases, I should think that software would output the fields in quotes to avoid the problem. Also, in CSVlint, we do try to allow different separators to be specified if we detect that the file isn't comma-separated.

Perhaps we should update the documentation to say that in the cases you mention above, fields should always be quoted?

CharlesNepote commented 9 years ago

But if the fields are quoted, some software could interpret 23,78 as a text chain and not as a number, which is problematic. Of course geeks will easilly understand this kind of problem and they will be able to transform the text 23,78 in a number. But for newbies, CSV files has to be easy to use. They won't understand why they don't have the expected result when they try to add "0,5" and "0,5". I've just made the test with LibreOffice and the result is 0 (instead of 1).

JeniT commented 9 years ago

Quoting in a CSV file doesn't affect the interpretation of a value as a number or a string. CSV itself doesn't have any conception of numeric or string literals (unlike JSON, where that would be an issue). Quoting in CSV is purely about avoiding commas in values being misinterpreted as field separators. Interpreting as numbers (or whatever) is an additional layer which is done by applications, possibly informed by metadata (schemas).