fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
815 stars 287 forks source link

JsonProvider.Parse interprets strings comprising only digits as numeric values. #656

Closed MiloDC closed 10 years ago

MiloDC commented 10 years ago

JsonProvider.Parse interprets strings composed exclusively of numerals as numeric types (decimal, in my tests), not string types. What is the reason for this?

I'm using an API whose documentation includes example JSON output. It's been very convenient for me to simply copy these examples and use them as samples for my JSON providers, but many of the string values in the examples comprise numerals only. Since your JsonProvider.Parse method auto-converts numeric strings to decimals, I'm required to place non-numeric characters into the example string value, just to force your Parse method to interpret a value enclosed in quotes as a string.

mrange commented 10 years ago

Hi.

The JsonProvider tries to infer the "real" data type when it encounters string values. There are pros and cons to this (my personal opinion is that no matter how intelligent code it will almost always backfire).

Turning off type inference completely will most likely introduce regressions so that's probably a no-solution.

An alternative idea is to introduce (if possible) a parameter in the JSON Provider that by default keeps the current behavior but allows the developer to turn off JSON provider type inference.

Mårten

mrange commented 10 years ago

Btw here is the code for the inference IIRC: https://github.com/fsharp/FSharp.Data/blob/master/src/Json/JsonInference.fs

MiloDC commented 10 years ago

I'm not sure I understand.

If it's enclosed in quotes, then it's a string. A numeric value would not satisfy this requirement.

That's the JSON standard (http://www.json.org/, "A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.") Shouldn't that standard trump inference?

mrange commented 10 years ago

As the code is the JsonProvider tries to be smart and infer the "real" type from what a string contains.

I am on your side in this but:

  1. Removing inference will probably introduce regressions for code that depends on this.
  2. There are probably lot of devs actually liking the inference. It was added for a reason.

Having an option in the TypeProvider to turn off the inference would be preserve the existing behavior to avoid regressions and still give devs not liking inference a way around it.

In the end I guess it's up to the maintainers of FSharp.Data. I only done some bugfixes and tweaks to the json parser.

Mårten

(It's also questionable IMO to parse JSON numbers as decimals as JSON derives from JavaScript which uses double datatype as numbers)

MiloDC commented 10 years ago

I love it when coders make a convenient feature standard behavior.

Your idea to make the current behavior optional is, of course, a great idea.

I'm not sure that I agree with you about the current handling of numeric values, though. It could be argued that the only reason that JavaScript stores floating point values as doubles is because that's how the language was originally engineered, 64 bits being the highest precision available at the time. I would prefer to see maximum precision supported.

tpetricek commented 10 years ago

I think the problem is that some real-world JSON sources use strings for numerical values - when there is no reason for it (like worldbank) or when they need higher precision numbers in JSON.

There is a good point that "42" is intended to be a string in JSON. I'm happy to go either way. A non-breaking change would be to add a static parameter that lets you turn off this behavior (i.e. StrictInference=true). The only disadvantage of this is that our list of parameters keeps getting bigger and bigger.

mrange commented 10 years ago

IMO NoInference=true would make more sense to me. I also see the problem of the an expanding test matrix but IMHO breaking backwards compability might be worse. Legacy code - the curse of useful libraries:).

PS. I tend to go with strings over numbers in JSON because JSON is underspecified when it comes to the actual representation of numbers. What precision? Binary (ie double) or decimal? Therfore numbers in JSON are unpredictable (when going between JSON implementations). Strings will always "work".

tpetricek commented 10 years ago

@mrange Makes sense!

I think if someone submits a pull request adding a static parameter StrictInference with default value false that will disable inference from strings when set to true, then we'll happily accept it.

There is a slight mismatch here - in JSON, there are actual numbers; but this is not the case in XML or CSV. But perhaps having CSV/XML return everything as a string would be a potentially useful option too...

mrange commented 10 years ago

IMO it makes sense for CSV/XML as well.

mrange commented 10 years ago

I think the code change should be rather simple but then there is the question of test coverage...

ovatsus commented 10 years ago

If you look at a lot of APIs (twitter is one example), they encode numbers as string to bypass the JavaScript number limitation. Remember that these days json is often used in contexts totally unrelated to JavaScript. I think the current behaviour makes sense, as it allows you to consume these strings as numbers in F#. We also interpret values which only have zeros and ones as booleans, as that also happens a lot in practice, even though json has real booleans. If you're manually creating sample json files, why would you use examples for strings with just numbers? In any case, I don't object to having a parameter to disable all type inference, but I'd like to keep the default as is.

mrange commented 10 years ago

I agree that default should be as-is to avoid breaking existing code.

mrange commented 10 years ago

@MiloDC can you try out the PR created as a response to this issue?

MiloDC commented 10 years ago

Pulled https://github.com/fsharp/FSharp.Data/pull/658/, seems to work fine!

My strings of numerals are interpreted as strings by the type provider, sweet.

mrange commented 10 years ago

@MiloDC, the PR is now merged (thanks to @ovatsus) but the NoTypeInference flagged changed name to InferTypesFromValues