Problems with thousands separators when parsing strings to numbers

varocarbas commented 8 years ago

Since some months ago, I have been aware about the problems that the parsing methods (at least, Parse and TryParse) of various numerical types (at least, Decimal and Double) have when dealing with thousands separators (at least in .NET 4.5; in both C# and VB.NET). The following C# sample code illustrates the point well:

string stringVal = "1,1.1";
decimal decVal = 0m;
if (decimal.TryParse(stringVal, NumberStyles.Any, CultureInfo.InvariantCulture, out decVal))
{
    //This part will be reached, because decimal.TryParse thinks that 1,1.1 is a valid number
}

Such misbehaviour can be replicated under many different conditions. The basic idea is that the parsing approaches don't understand what the thousands separators really imply (i.e., groups of 3 numbers in the aforementioned example).

Today, I focused my analysis of the code on the Decimal type and found ParseNumber (mscorlib\src\System\Number.cs). The way in which this method treats the thousands separators (in all their forms) explains perfectly the observed behaviour: it focus the analysis on valid/invalid characters and doesn't bring into account the size of the group at any point.

My proposal consists in forcing the corresponding methods to trigger an error when the thousands separators aren't in the right positions. My intention is firstly focusing on the Decimal type (although the aforementioned method is most likely used by other types too). After having ready a good enough correction for Decimal, I will analyse the remaining types in detail and perform additional changes if required.

The correction will consist in analysing the input string; in case of including thousands separators, their locations would be checked. If this analysis fails, no conversion to number would be performed. In principle, I am planning to take advantage from the current implementation (more specifically, a loop analysing all the characters) and rely on “low level” approaches; although I have still to think carefully about the exact algorithm.

varocarbas commented 8 years ago

I am new here and I am not sure how all this works. So, I will better wait for a reply before starting anything.

mikedn commented 8 years ago

Such misbehaviour can be replicated under many different conditions. The basic idea is that the parsing approaches don't understand what the thousands separators really imply (i.e., groups of 3 numbers in the aforementioned example).

Why exactly is this a bug? There doesn't appear to be anything in the documentation of decimal.TryParse that requires group sizing to be enforced. Also, group sizing doesn't affect the numeric result in any way - 12,34,56.78 is exactly the same value as 123,456.78. Changing this behavior now seems like a gratuitous breaking change.

varocarbas commented 8 years ago

@mikedn When I use Decimal.TryParse (or an equivalent method of other numerical type), I expect to get true when the given string represents a valid number (on account of the given type and culture) and false otherwise. "1,1.2" or "1,,,,,5" (yes, it also accepts this last one) or similar are not valid numbers, but the current implementation might recognise them as such (i.e., bug). What is the difference between these examples and "111#12"? This is not a valid number either, but is it less valid than the aforementioned one? Do you prefer to always extract all the relevant information (i.e., numbers) and only output false when strictly required (no numbers at all)? Then why is “111#12” not accepted too?

The most ironic part is that I usually ignore thousands separators (not too typical in Spain, my country; neither in the kind of fields where I am more experienced). I knew about this "issue" some months ago while answering a SO question of someone being very interested in triggering an error for these cases. After this question (which BTW was solved by relying on manually pre-checking the format), I have seen at least 2 additional ones on the same lines. Thus, I am quite sure that this behaviour is not what some people expect and that they plainly don't trust the in-built parsing methods (i.e., doing a format check before using a parsing method?!).

Regarding what you say of "doesn't appear to be anything in the documentation...", note that the method which deals with these actions (the one I am mentioning in my first post) precisely includes a comment explaining the supported situations, where properly accounting for the locations of thousands separators is not being mentioned (as said, the method doesn't even deal with the required information). In any case, I don't think that this issue is relevant at all; other than for supporting the differentiation between bug in the code (i.e., a mistake in the actual implementation) and bug in the underlying logic (i.e., pursued goal not right enough).

Lastly, I want to highlight that the required correction would have a really low impact. I have already analysed this specific bit of code and can include this functionality by just accounting for a couple of additional variables in a loop. On the other hand, one of the outputs of carefully analysing this method might even be a slight improvement of its performance.

mikedn commented 8 years ago

What is the difference between these examples and "111#12"?

It seems obvious to me, # is invalid character in this context while , is not.

I knew about this "issue" some months ago while answering a SO question of someone being very interested in triggering an error for these cases.

It may be useful to include a link to that question or at least an excerpt from it if you want to convince people that a breaking change is necessary.

precisely includes a comment explaining the supported situations, where properly accounting for the locations of thousands separators is not being mentioned (as said, the method doesn't even deal with the required information).

I don't see any such comment but anyway, what does that have to do with what the documentation says?

Lastly, I want to highlight that the required correction would have a really low impact.

How do you know that? There may be code out there which parses text which contains such numbers and all of the sudden that code will stop working.

varocarbas commented 8 years ago

@mikedn So, you think that a parsing method which accepts "1,,,,,,,,,,,,,,,,," as valid number is fine because "," are the thousands separators in the given culture? Then why not accepting "1..........." too? What makes the thousands separators so special?

Regarding linking to the SO questions, I can easily locate one (because I answered it); but not so sure regarding the others questions. Although if this is a basic requirement to do what I consider required, I might do some effort.

Regarding why I think that it would have no impact? Exactly for the same reason why I consider that I can fix it: because I have quite a long experience on (efficient) algorithm building and know what I am talking about :)

BTW, do you know what are the expected next steps? Should I just wait here until someone in the .NET considers that my suggestion is relevant?

mikedn commented 8 years ago

So, you think that a parsing method which accepts "1,,,,,,,,,,,,,,,,," as valid number is fine because "," are the thousands separators in the given culture?

What I'm trying to say first and foremost is that this is a breaking change and as such it needs to be treated carefully. It's more important how this change affects existing code than the fact that that it can be fixed, or if the fix is efficient or if there's who knows what comment in the source code.

Joe4evr commented 8 years ago

So, you think that a parsing method which accepts "1,,,,,,,,,,,,,,,,," as valid number is fine because "," are the thousands separators in the given culture? Then why not accepting "1..........." too? What makes the thousands separators so special?

Nothing, really. Numeric separators aren't by the thousands everywhere on Earth. And they aren't commas everywhere on Earth, either. Numbers are really confusing once you bring culture into the mix.

varocarbas commented 8 years ago

@Joe4evr It was a rhetorical question. The current implementation of Decimal.Parse (or similar) accepts "1,,,,,,,,,," as valid, but triggers an error with "1............". Does it make sense? No. That's why I am suggesting to change it (what @mikedn thinks that is not a good idea).

PS: if you think that numbers are confusing why are you commenting in a thread aiming to change the source code dealing with these issues?! Note that the culture is being brought into account always when parsing decimal (when using an overload not taking this argument, the code would assume a default one). PS2: don't want to offend anyone, but I was expecting a bit more of "action" (i.e., actually implementing changes or discussing with actually knowledgeable people in the given implementation). All what I see here is general talking.

mikedn commented 8 years ago

but triggers an error with "1............". Does it make sense? No.

The decimal point has a very specific meaning and its presence and position affects the result. IMO the fact that 1..... triggers an error makes sense.

mikedn commented 8 years ago

if you think that numbers are confusing why are you commenting in a thread aiming to change the source code dealing with these issues

The real problem here is that you think that this is about changing a bit of code somewhere. It is not, it's about changing the behavior of a commonly used framework API.

varocarbas commented 8 years ago

@mikedn I am seriously not interested in convincing anyone or tolerating not-completely-straightforward/honest attitudes. I have explained my point clearly, would love performing this modifications (and quite a few other ones) but if I cannot I would accept it (haven't had anything until yesterday).

The situation is very clear: you have "1,,,,,,,,,,,,," being accepted as a valid number. Do you want that? Excellent; but let's better not waste everyon's time by trying to prove that this should actually be the case (you don't have any way to convince me of such a thing).

I have made my proposals and spent a relevant amount of time explaining different issues. If you (the decision-makers) consider that I might go ahead (or even that I should provide further informtion), please let me know.

shahid-pk commented 8 years ago

@varocarbas lets say i am user of this TryParse i am depending on 1,,,,,,,,,,,,,,, to be valid why ? because this behavior is their for almost 10 years , i don't know how many apps i have developed depend on this. If you change this you will break all my apps. Do you have a solution that fix this and does not break all my ten or so app that is using the old behaviour ?

varocarbas commented 8 years ago

@shahid-pk what you are saying doesn't make any (practical) sense. Please, take a look at my following comment.

varocarbas commented 8 years ago

Thanks for your contributions, but seriously this is not what I want.

I have started some threads which I consider completely clear. If I am wrong and this community doesn't think like me, I would not implement anything (and, most likely, wouldn't contribute further).

I like objectivity-driven communities where all their members have similar knowledge and expectations and where subjectivity is not favoured. The open .NET seems a perfect excuse for objective-correctness-focused discussions where everyone would win (Microsoft by getting a beyond-excellent product and the contributors by working on so worthy resources; seriously, after taking a quick look at CoreFX & CoreCLR I am speechless); this situation is certainly very appealing to me. Participating in random chats with random people (no offense) about random issues is not what I want.

Please, if you have solid enough expertise related to what is being discussed here (e.g., efficient algorithm building, deep .NET knowledge mainly regarding this specific implementation, you are an local decision-maker, etc.) and you want to share anything with me (question, suggestion, request, etc.) from a completely technical and objective perspective, please feel free to contact me. Otherwise, I would not answer you. As said, no disrespect intended, just trying to avoid everyone wasting their time.

mikedn commented 8 years ago

I am seriously not interested in convincing anyone or tolerating not-completely-straightforward/honest attitudes.

I'll pretend that I didn't see this baseless accusation.

I have explained my point clearly, would love performing this modifications (and quite a few other ones)

That's all fine and appreciated but again, this isn't about changing the code.

If you (the decision-makers) consider that I might go ahead

Decision makers (Microsoft people) have yet to comment on this matter. Neither me nor anyone else who commented in this thread up to now do not work for Microsoft. But the decision makes have clearly stated (and documented) in the past that breaking changes need to be discussed. That's exactly what we are doing now but it appears you're not willing to have a discussion.

if you have solid enough expertise related to what is being discussed here (e.g., efficient algorithm building, deep .NET knowledge mainly regarding this specific implementation, you are an local decision-maker, etc.)

Again, we're not discussing any implementation details, we're discussing the behavior of the API. I don't know how to make this more clear, you're approaching this from the wrong angle.

varocarbas commented 8 years ago

@mikedn I am answering mostly to highlight that you misunderstood comment ("see this baseless accusation"). My bothering-you comment wasn't exactly addressed to you and was meant in a quite wide sense on the lines of "subjective views when dealing with objective facts". I have been answering comments from quite a few people for a while; what was tiring, because I don't like it (wasn't expecting it here and don't even think that things here should be that way).

I am not trying to criticise anyone's approach, just clearly stating what I am willing to do. I am more than happy to explain what is required from a strictly technical perspective; to convert my "I think that" into verifiable facts. Also in case of being required, I am very happy in following a more or less complex process (even involving some random chatting) but only in case of aiming a clearly-defined goal (not the case so far).

What about if I exclusively participate in conversations dealing with the implementation aspects? You can discuss as much as you wish, come to any conclusion and letting me know only in case of thinking that my approach is worthy. Would this be acceptable?

mikedn commented 8 years ago

I am more than happy to explain what is required from a strictly technical perspective; to convert my "I think that" into verifiable facts

What is required? I'm more interested to know why is this change required, a real world use case where the current behavior poses a problem.

varocarbas commented 8 years ago

http://stackoverflow.com/questions/32676992/validate-currency-textbox/32677430

varocarbas commented 8 years ago

But you can find quite a few more examples after a quick research: http://stackoverflow.com/questions/831727/c-sharp-decimal-parse-issue-with-commas http://stackoverflow.com/questions/2094254/number-parsing-weirdness http://www.vbforums.com/showthread.php?643055-RESOLVED-Double-TryParse-returns-true-for-quot-19-6-quot

mikedn commented 8 years ago

None of those posts seem to answer the "why?". People simply say "I want X" but nobody explains why.

I have 2 online banking accounts and I was curious to see what happens if you enter something like 12,34,3. One simply ignores all the commas and the other doesn't allow you to type any commas (which IMO is the best thing to do). If banks have no problems with digit grouping then why is this a significant problem for a WinForms app?

If you try to copy/paste 12,3,4 in the Windows calculator you get 1234. Excel doesn't treat 12,3,4 as number but then Excel is a rather special case because it tries to distinguish text from numbers.

If this fix/change was for .NET 1.0 Beta then it would have been perfectly fine but it's .NET 4.6.x we're talking about and introducing a breaking change simply because someone wants it but offers no justification won't be easy.

varocarbas commented 8 years ago

@mikedn (This is precisely the kind of conversation which I don't want to have, because of representing a pure waste of time. A more-sensible-than-me person would plainly ignore such a pointless argument, but I am too damn respectful and don't like ignoring people. You clearly come from a completely different background and I don't have any interest in convincing you of anything, just in showing what I consider relevant to people thinking like me. Not sure about the reason why you are so interested in continuing a conversation with a person seeing things from a completely different perspective than you do. This one will be my last reply to comments on these lines)

All what you are saying doesn't make any sense! You are talking about applications. You are not facing the problem as a programmer, but as a user! If you click on a button, you don't know what this program is internally doing! We are precisely discussing about these internals!

You seriously think that if Decimal.Parse fails in certain situation, all the applications relying on it would fail too?! Only the applications built by incompetent programmers would fail. Competent programmers create applications which will always work independently upon virtually anything else (e.g., the errors of the given programming language, how clueless the given user might be, etc.)!

The applications created with .NET 2.0 (and before .NET existed) didn't include lots of fancy features which the last version includes, but basic functionalities like inputting "12,2,3" and not getting any error were still working anyway. With loops, conditions and a couple of things more you can create almost anything; although this is not the goal of the modern programming languages (i.e., basic functionalities), but being as programmer-friendly as possible (e.g., to maximise productivity), to not mention the internal consistency of the given environment (perhaps irrelevant to you, but not to everyone).

Have you read all the questions in my previous comment? What do you think that these people were doing after realising that the decimal (or double or float) parsing method wasn't working as expected?! Letting it like this? Creating a warning saying "Sorry, the functionality which you have requested is still not accounted by .NET"?! If you have to multiply 5 times 5 and the multiplication sign of your calculator doesn't work, you can do 5+5+5+5+5; would that mean that the multiplication sign doesn't need to be fixed? No. You need the multiplication sign to do things easily and quickly. And even worse: if you are commercialising calculators and one of your clients comes to you complaining about the multiplication sign not working, would you say him "use the addition sign"? What would that client think of your professionalism and the reliability of your products?

Please, stop converting what seemed a true dream (being able to perform relevant modifications in CoreCLR & CoreFX) in meaningless conversations which I don't want to have.

mikedn commented 8 years ago

Please, stop converting what seemed a true dream

I'm not converting any dream, I'm just trying to wake you up.

varocarbas commented 8 years ago

@mikedn Wake me up? From what? Do you think that I don’t have experience in dealing with cluelessness? That I haven’t seen lots of problems provoked for no reason? That a win-win situation can easily be converted into a lose-lose one? I am perfectly aware about this sad reality (much more than what the attitude I show might indicate; in fact, I don't trust in people’s proper understanding and practicality since some time ago).

FYI, the dream already came true (I have full access to an impressive code; thanks again Microsoft, you made me really happy :)). It would be quite nice to allow others to also benefit from all what I will be doing with that code; but to be honest, I don't have many hopes on this front (as said, I have long experience dealing with pure nonsense).

aevitas commented 8 years ago

I am new here and I am not sure how all this works.

@varocarbas Fairly certain the way you're approaching this right now won't get you very far. Accusing people at random while praising your own achievements/skills, instead of providing concrete, real-world cases of why a change that would break backward-compatibility on an extremely widely used API that has been in the framework with its current behaviour for over a decade would be required.

There are countless applications running business critical logic that rely on this API for them to work. If that API were to suddenly change when the target platform for those applications changes, without any notice, then that could be an absolute disaster. The fact that you don't seem to grasp that is simply beyond me.

Barging in here shouting that you're not interested in discussion and your proposed change is the be-all-end-all solution to a huge (but non-existent) problem, is not in the spirit of this community. You're not Torvalds, and this isn't the Linux repository. Get a grip.

janvorli commented 8 years ago

@varocarbas When you work on something like .NET that has millions of developers targetting it, you cannot afford to fix somethings in a way that would break substantial amount of existing applications. Even though the fact that they relied on the broken behavior can mean that they were not great programmers. Because if you do that, you hit the users of those applications and they will blame you, not the application developer. From their point of view, they had an app that was working fine, but when they upgraded to the latest .NET framework runtime, the app stopped working. Of course bugs should get fixed in general, but one need to be extremely careful about the way to fix them so that the fix doesn't break others.

shahid-pk commented 8 years ago

@varocarbas no one is questioning your skills or ability here , all that we are questioning is your idea and giving our views on your idea. Even the developers that are professionally developing .net gets questions like these. You have to understand that this is not targeted only to you or only to your idea, when you create an issue/or mail on any open source project which brings a new idea, others developers give their views on that idea , this is how open source works. This is certainly not to discourage you but to encourage you to reason about your change taking into account every concern that is raised. Coming to this issue if you look it from front yes this a behavioral bug which in a perfect world should be fixed but we don't live in a perfect world , we live in a world in which ms has shipped this behavior for ten years or so and people like you and me are working around this bug, so a change in the behavior will break those workarounds , and no one can be certain that this change won't break many many applications. Neither can we say to users of .net that your application was working on .net 4.6.1 but now your application is broken on .net 4.6.2 because you are a bad programmer and you used our buggy TryParse which we just now fixed. Further more we can only talk about the technicalities of the code that you would write if all the stalk holders agree to bring this breaking change. Which is the easiest part because as i said earlier no one is questioning your capability or ability.

varocarbas commented 8 years ago

Please, I don't want to continue with all this. My first comment should be descriptive enough, but here comes a last summary:

Clear bug (e.g., accepting "1,,,,,,,," as a valid number; but triggering an error with "1.......").
I found already the exact place where it is located; and the eventual correction would have a very low impact (i.e., just accounting for a couple of variables more in a loop); I might even analyse the situation further to see if I could get an additional improvement in performance (what after a quick look seems possible -> but have to analyse it more carefully).
It is very unlikely (to not say virtually impossible) that any (serious-enough) code has ever relied on a so unexpected behaviour (tolerating thousands separators in any position, but not anything else?). On the other hand, there are quite a few references where this behaviour is considered quite illogical, as shown in the links above. Note that most likely this problem has always been present (since .NET was created), but people keep complaining (although I have seen some "thousands parsing is very unreliable, better check the format manually" attitudes, which I am not sure that are too positive).

Regarding my personality and expectations:

I am new here, but with no interest in adapting myself to what I don't consider appealing enough. That is: I want to reduce the social/subjective essence to its minimum expression and focus most of my interactions on technically-relevant issues. If this is not possible (a perfectly logical scenario), I would plainly accept that direct contributions like this one are not for me (and might think about alternatives).
I want to focus any conversation on strictly technical aspects, which might be objectively proved (e.g. by testing these cases under those conditions, this version took x secs. and this other version y secs.).
I don't want to talk about abstract issues, expectations, assumptions, etc.
I want to exclusively deal with programmers who are experienced enough in .NET (and, ideally, in CoreCLR) and by always focusing on issues which are directly related to programming.
I understand that this is a big community with lots of different opinions and that other types of discussions will be occurring. I would like to not participate in any of them; just discuss about anything you want and come to your conclusions. I will accept the resulting decisions.

I am not trying to be disrespectful, just minimising useless wastes of time by highlighting some useful facts. Logically, I will accept all the consequences of my attitude (e.g., my two proposals being rejected); and I expect anyone else to do exactly the same thing (i.e., if you want to discuss about issues which I don't consider relevant, I would plainly not answer you).

I understand that all this might sounds a bit weird, but at least it is crystal clear :)

aevitas commented 8 years ago

@varocarbas Your passion for technical details is admirable, but unfortunately, software development - and especially on the scale of .NET - revolves around expectations, promises, and assumptions. Merely by saying "It is very unlikely (to not say virtually impossible) that any (serious-enough) code has ever relied on a so unexpected behaviour", you are already creating assumptions of your own.

Not all people, and indeed not all programmers, are created equal, and although most of us prefer the vacuum of technical implementation details, when it comes to proposing changes to a long-established code base, discussion such as in this issue is a required part of the process. Your proposal may indeed have merit, and there is no one turning it down saying that is doesn't straight off the bat - they are merely proposing opposing views. Whether that is a process you would like to participate in or not, is entirely your choice.

PapaCharlie commented 8 years ago

@varocarbas Unfortunately, this has nothing to do with your culture or locality, but rather with the standards of the language, the library and its culture. You say:

Clear bug (e.g., accepting "1,,,,,,,," as a valid number; but triggering an error with "1.......").

But it is not a clear bug, in fact it's not a bug at all. In the American standard, the thousands operator is indeed optional, and while writing numbers like this: "12,34,56.3" is frowned upon, it is not incorrect. On the other hand, there can be at most one decimal point, otherwise the string of digits cannot be interpreted as a valid number. The reason I am talking about the American standard here is because the underlying logic of this library, and the .NET framework, is built upon this standard. If the .NET framework were forced to also accommodate other standards, then I might as well request that the "for" keyword be changed to "para" for all the same reasons you want to change the behavior of this function. You want this function to behave in a non-standard way, which is by definition a code-breaking change. As a matter of fact, the behavior you are proposing would be considered a bug, not the other way around.

RobThree commented 8 years ago

I would like to quote some comments from Reddit:

For example if you accepted a formatted value 1,234,567 with the user deleting the last two characters it could become an input of 1,234,5 and currently the method would accept this value. If the user deleted everything but the first digit it would be 1,, and still be valid. The number of comma's is entirely arbitrary at this point. - (Losobie)

Even worse: suppose I entered 1,000,000,000 and realise I've entered one too many zeroes I have to now not only delete only the last zero, I also need to re-arrange all the thousands separators and, effectively, re-enter the entire number.

The correct answer to why he's wrong is that thousands separators in numbers are semantically insignificant whitespace and numbers can only have one radix, since the radix signifies the ones place. Thus, a radix is significant, and a thousands separator is not. We could say that the first radix in a string is the canonical one for that string, but that would be a decision of the specification maker. What would make much less sense is treating semantically insignificant whitespace as semantically significant. (Godd2)

Good point IMHO.

Then there's the "detail" that a thousands-separator is not always used to group digits in threes (so I'm not quite sure how 'simple' a 'fix' would be (if you're gonna fix it, do it right, right? :stuck_out_tongue: ) , might be harder than you think):

The Indian numbering system is somewhat more complex: it groups the rightmost three digits in a similar manner to European languages but then groups every two digits thereafter: 1.5 million would accordingly be written 15,00,000 and […]. (Wikipedia)

The International Bureau of Weights and Measures states that "when there are only four digits before or after the decimal marker, it is customary not to use a space to isolate a single digit" (Wikipedia)

The above quotes are, to me, reason enough to treat thousands separators as 'whitespace' or 'noise' if you will; they're only for (human) readability and change nothing to the meaning of the number. So why bother with them at all when parsing a number?

Yes, 1,,,,,, doesn't make sense and yes maybe a, for example, ParseStrict() method should make it's way into .Net but changing behavior on existing methods people (knowingly or _un_knowingly) rely on is a dangerous thing and should not be taken lightly.

And I'd like to leave with a quote from myself:

Such changes require […] shims, maybe even a (new overload with a) extra argument to the […] methods that allow you to specify the desired behavior or maybe even a whole new method needs to be created […].

Nobody is saying that the current behavior is absolutely perfect and nobody is saying there's nothing wrong with current behavior. They are, however, saying that such changes can't be made on a whim or go overnight. It needs careful consideration and deliberation.

jakesays-old commented 8 years ago

@varocarbas You are completely missing the valid point @mikedn is trying to make. HOW the code should work in your opinion is irrelevant. What is relevant is the fact that the current behavior is what people expect, and has been what they expect for 15 years now. You cannot make a breaking change like this, not even to make the code 'correct.' If you do you run the risk of potentially breaking thousands of applications.

I really do not understand why you are having such difficulty with this concept.

Davio commented 8 years ago

There is a difference between being parseable (which is always a best-effort kind of thing, trying not to break on things it can deal with) and being strictly valid in a certain context (in this case in a certain default culture), they're not the same; valid is always going to be a subset of parseable. A number like " 1.2 " (note the whitespace) isn't strictly valid, but I'm glad it can be parsed.

Havvy commented 8 years ago

It is a technical issue that making a breaking change breaks programs that already exist. If you don't care about backwards compatibility, you can ignore it, but the CLR cares about backwards compatibility, so it is stuck with the decision made 10 years ago. Because of this, even though it is feasible to write the code to change it, it is not possible to publish it in the mainline CLR.

mgravell commented 8 years ago

To me, the fundamental problem here is that you are asserting a particular definition of "correct", but that definition is highly questionable. You say string stringVal = "1,1.1"; is not a valid decimal. Well... yes, yes it is. It might not be in the most commonly expressed form for that culture, but: it is perfectly well-formed, unambiguous, etc. There is no reason not to accept this (contrast "1.1.1", since . in the stated culture has a very different impact), unless we're using TryParseExact. Now; decimal doesn't have a TryParseExact, but some other types do.

If the conversation were restructured to

Proposal: add TryParseExact to numeric types, enforcing expected formatting rules including group separator positions, etc

then I could totally get behind that proposal. But: the API at the moment is perfectly valid - it just isn't what you expect. That's fine; it doesn't have to be.

varocarbas commented 8 years ago

Thanks again for all your contributions. I insist in the fact that there are many comments, most of them focused on certain ideas about which I don't want to discuss (i.e., if you think in this way, I would accept your decision; trying to convince you is not one of my priorities).

Some of the last comments are highlighting issues with a slight technical essence which might need some clarification (well… not ideally); this is the main reason for writing this new comment.

Note that I use expressions like ".", ",", “thousands separators with certain number of digits in between” just to simplify ideas. I really mean: “decimal separator within the given culture”, “group separator within the given culture” and “size of the group in the given culture”. The original version of the parsing method (i.e., the one which I am referring in my first comment) logically accounts for the given culture (not for "." or ","; but for the decimal/group separator in the given culture); that is: the string is already being parsed on account of the expected format. Curiously, this information is not being fully maximised and that why the error I am referring happens: this method does know the size of the group/thousand separator (even if groups are acceptable/not in the given culture), but it doesn't use it at all. Although it does use the given group separator character; that's why the referred faulty reality exists: "1....." being wrong and "1,,,," being right (logically, I mean the English/Invariant culture; the one which this method takes by default in case that the user doesn't expressly input a different one).

I hope that this point is now clear. I hope that you will continue enjoying your discussion. Again thanks to everyone for sharing your thoughts. Again, no disrespect intended (but why do I have to clarify things which should be evident to the kind of people I want to discuss with, that is: knowledge-enough on this specific issue?). And again, make the decision you wish and contact me only for issues strictly related with the actual implementation.

varocarbas commented 8 years ago

@mgravell is talking to me!!! Ah!!! I am kidding, but you are kind of a legend, you know? :)

Sorry to disagree with you, but "1,1.1" is plainly wrong not subjectively wrong. There is no valid reason explaining why "1..1" is wrong and "1,,1" is right.

On the other hand and as explained many times before, I do understand that there are many more issues involved when deciding to perform certain modification. I am not trying to convince anyone about the main priorities here (i.e., keeping backwards compatibility no matter what vs. being somehow adaptable on this front). I am not even saying that am in a position to know what is best (you, for example, have certainly much more experience than me on big developments and on coordinating different expectations). I have proposed a modification which I consider worthy. Doesn't the .NET community think like me? I would accept any decision (but not necessarily agree with it).

jakesays-old commented 8 years ago

@varocarbas "1,1.1" is 'right' because the ',' HAS NO MEANING when it comes to parsing numbers. the ',' is there SOLELY FOR HUMAN READABILITY and offers NO semantic value. However the '.' IS semantically important, so 1....1 is clearly not a valid number. 1,1.1 is semantically valid because 1,1.1 is the NUMERIC equivalent of 11.1. Can you not understand this?

varocarbas commented 8 years ago

@jakesays Please, don't force me to be involved in a conversation which I don't want to be part of. I have been crystal clear on this front in my previous-to-last message (i.e., with "," I mean group separator in the given culture). Try to make some effort to understand properly, rather than arbitrarily criticising every single word (and/or defending something as pointless as "1,,1" being a number and "1..1" not being a number).

jakesays-old commented 8 years ago

@varocarbas Oh I do understand very well what you are after. And you have already involved yourself in the situation you don't want by refusing to consider the objective opinions of those of us who have a grasp of the bigger picture. Our concern is how your suggestions will potentially affect thousands of developers and systems. I am amazed that you cannot grasp this concept. Do you not think it important that these developers and systems be affected? Does that have no value to you?

mgravell commented 8 years ago

There is no valid reason explaining why "1..1" is wrong and "1,,1" is right.

Yes, there is. You are just ignoring what people are saying about it. Group separators are essentially only padding. 12345678 is identical to 12,345,678 is identical to 12,34,56,78 (clarification: in a culture where groups are comma-separated). You probably wouldn't choose the last form (although various group lengths exist in different cultures, including groups of different length depending on position), but it isn't invalid. The decimal separator is completely different; it can only appear once, and if it appears more than once there is no valid way of interpreting the data. Trying to emphasize "1..1" vs "1,,1" actually weakens your argument.

I have proposed a modification which I consider worthy.

And people are pointing out the many ways in which this modification would be undesirable and unhelpful - and that is before you get to the problems of multi-targeting (it would be very bad if decimal.Parse did completely different things on dnxcore50 vs dnx451). And hence also why the "perhaps add a new TryParseExact etc" could be a viable alternative that achieves what you want, without hitting all the bumps in the road.

varocarbas commented 8 years ago

@jakesays I think that you are the one not getting the idea. I haven't doubted even a single second about my limited expertise regarding so big implementations. That's why I said: "discuss for as long as you wish and come to your conclusions". I want to continue focusing my activity on mere implementation aspects (i.e., "you have the go ahead; do that") and not having to worried about all these issues.

When I wrote this proposal, my intention was triggering the process to perform the correction; but being exclusively involved in the correction itself (not in all the required discussions driving to the final decision). As said since a while ago: no disrespect intended; just don't want to participate in certain discussions; because I don't like them, do recognise my limited expertise on certain aspects and do want to continue having my less-bigger-picture ideas.

jakesays-old commented 8 years ago

Ok then you have communicated your position on the implementation aspects of your idea. However you cannot propose such an idea and not involve yourself in the larger picture because of its importance. I would suggest you attempt to learn from the comments of others so that you may be able to fully participate in the process.

varocarbas commented 8 years ago

@mgravell Someone proposed the TryParseExact alternative before and I thought the same than now: it seems much more complicated. But I insist in my limited experience on certain issues. If you think that this would be a better solution, you might be right (although still not too sure why something like parsing "1,," as a valid number should be supported at all).

Davio commented 8 years ago

Still I insist you try to see the difference between "something that can be parsed" (can be turned into a decimal number with no ambiguity) and "something which is strictly valid given a certain culture". You seem to think these are one and the same while they're in fact not.

And who even decides what's strictly valid? Certainly not you, is there an official RFC for number representation or an ISO standard? I can't seem to find one. So all we have to go on are some real world examples and in that case it's very helpful if the parser accepts a wide range of possible values (as long as there's no ambiguity about the number it represents) instead of trying to be limited. 1,,,,,1 is just as straightforward for the parser as 11, there's really no harm in it being able to parse it.

varocarbas commented 8 years ago

@Davio The problem is that accepting "1,,,1" and rejecting "1...1" is ambiguous and arbitrary. And this behaviour responds to the mere fact that the algorithm taking care of string parsing didn't account properly for the more-complex group/thousands separators reality (i.e., what, IMHO, represents a bug); even despite of having all the required information available.

Anyway, thanks for your inputs and please understand that I don't want to get involved in certain discussions. I will accept whatever decision the community (I mean, Microsoft) will make.

MartinJohns commented 8 years ago

The problem is that accepting "1,,,1" and rejecting "1...1" is ambiguous and arbitrary.

It was mentioned multiple times already that this is not ambiguous or arbitrary in any way. The one is a padding character ignored by the parsing algorithm, the other is a separation character which identifies when the floating-point part begins.

Davio commented 8 years ago

No it isn't. 1,,,1 isn't ambiguous, it can only be 11. 1...1 just isn't a number (would it be 1.1? or 1.001? 1.0.0.1, that's not even a number. something else? this is why a number may only have 1 decimal separator)

You seem to have a problem with the group size, especially when it's 0, but the group sizes can be really awkward (for instance in India you can have something like 1,00,00,000), now do you choose group size 2 or 3 here? Both won't work. The option which works best (is able to parse most numbers) is to just disregard the group size and group separator altogether and that's what's currently happening and there's no need to change that. If you can't be convinced, well, I'll stop wasting my time talking to a wall.

mgravell commented 8 years ago

Someone proposed the TryParseExact alternative before and I thought the same than now: it seems much more complicated.

I fail to see how. It is a pre-existing pattern that expresses an intent to follow strict formatting rules, rather than lenient ones (the default). It seems to be exactly what you are after.

Additionally, to highlight the impact: from my perspective (as a library author targeting multiple platforms), anything other than this approach can basically be rephrased as "you can't use Parse or TryParse any more, and need to remove it from any existing libraries". In the real world, you can't fundamentally alter what a method means. What you propose is not a bug fix, it is not a feature request (additional support): it is a major breaking "this method now means something completely different". And simply: that is never going to fly.

varocarbas commented 8 years ago

@Davio "well, I'll stop wasting my time talking to a wall." -> thanks for giving me a break and allowing me to stop answering what I consider irrelevant concerns from a most-likely unreasonable person (yes, you are the wall in my book).

doomchild commented 8 years ago

Fundamentally altering the use case for any method older than about six months is technical suicide. Changing what a method assumes about its inputs falls directly into this zone.

dotnet / runtime

Problems with thousands separators when parsing strings to numbers #4762