angularsen / UnitsNet

Makes life working with units of measurement just a little bit better.
https://www.nuget.org/packages/UnitsNet/
MIT No Attribution
2.19k stars 380 forks source link

"Length.TryParse" parses invalid values #343

Closed nimishmistry closed 5 years ago

nimishmistry commented 6 years ago

Length.TryParse("2m,,,1", out length) is parsed and the length is set to "2m". Is this intended?

I'm trying to use TryParse for validation.

angularsen commented 6 years ago

Hi, thanks for reporting. This is not intentional, we try to be lenient about whitespace and combining multiple value/abbreviation combinations with different delimiters, but this is way beyond those usecases :-) I'm sure it's a matter of tweaking the regex a little bit. Would you care to try to fix it? A pull request would be most welcome.

https://github.com/angularsen/UnitsNet/blob/master/UnitsNet/CustomCode/QuantityParser.cs https://github.com/angularsen/UnitsNet/blob/master/UnitsNet.Tests/CustomCode/ParseTests.cs

nimishmistry commented 6 years ago

Thanks of the offer, will definitely attempt to fix this.

angularsen commented 6 years ago

Awesome!

bplubell commented 6 years ago

I took a look at this and found the problem is where the QuantityParser is trying to capture trailing, invalid data. It is only checking for alpha characters and, since comma is not an alpha character, it does not capture it. https://github.com/angularsen/UnitsNet/blob/45c747db693d1f2e8a38651b7a37f695d7450d95/UnitsNet/CustomCode/QuantityParser.cs#L71

Trying to make this capture more characters (like comma) could be problematic with the current design where the string can have multiple sets of valid units (such as "1m and 200mm"). I think it would simplify things greatly and be more in keeping with framework parse methods if we did not allow multiple sets units. For example, double.TryParse("1+2") returns false throws an exception.

Proposal: Fail when parsing with multiple sets of unit. The expectation would be that consumer code would pre-parse, since more complex maths than addition could be required and handled by the consumer.

Thoughts?

angularsen commented 6 years ago

I agree it may be unintuitive that we support parsing strings like 1 kg 500 g. If I recall, the reasoning was originally that since we needed to support parsing 1' 5" for feet/inches, then maybe it was better that we had a consistent parsing logic instead of special cases. In hindsight, I don't know of any other cases besides feet/inches that actually make much sense in combining value/unit pairs like that? If no one can come up with other usecases for this, then yes I think we should consider removing it in favor of only special-case parsing feet-inches to spare everyones' sanity. It would be a breaking change and have to be added to our #180 wishlist for a major version bump, but that is always something we can do. The list is significant already.

Also I don't understand why we allow , separators between value/unit pairs?

@"(and)?,?", // allow "and" & "," separators between quantities

Whitespace or adding the word "and" in between, sure, but when is it natural to parse a string like 1',1" or 1kg,500g ? It seems contrived to me, but maybe I've just forgotten the reason it was added in the first place. And it should not have been valid to have empty value/unit entries such as in your example with multiple commas in a row. There is simply a lot of odd things going on here and I think we are trying to be way too lenient on the input, which is ultimately just going to cause problems when people expect it to handle all sorts of trash input.

Related PRs: https://github.com/angularsen/UnitsNet/pull/64 https://github.com/angularsen/UnitsNet/pull/81 https://github.com/angularsen/UnitsNet/pull/254

I have not yet read through these, so my recollection is still poor on design choices.

@maherkassim Was involved in much of this, do you have any comments or recollection of why we did these things?

maherkassim commented 6 years ago

@angularsen I believe that the main use case was feet & inches (eg. 1' 5"), but that we decided to keep the regex broad to avoid needing to add other special cases in the future (all of the relevant discussion can be seen on #81).

To clarify, I'm ok with having the regex be more restrictive and handling feet & inches as a special case. The only other special case I'm aware of is stone & pounds, but I'm not sure how often that's used (especially within the context of UnitsNet).

Also, maybe @gojanpaolo can shed more light on any related changes in #254?

angularsen commented 6 years ago

Thanks for the heads up of stone and pounds @maherkassim.

@nimishmistry Anything I can help to move forward with a PR on this?

As noted above, this will be a breaking change and can only be merged in as part of a v4 and it will likely take some time with prerelease to get all the other changes in #180 included as well, but this is a good start at any rate. I just pushed a new v4 branch that we will create pull requests towards for breaking changes like this and those changes in #180.

angularsen commented 5 years ago

This issue will be resolved in v4 release, see #487 .