eppye-bots / bots

Automatically exported from code.google.com/p/bots
63 stars 125 forks source link

Bots numeric precision limits #393

Open mrkazoodle opened 3 years ago

mrkazoodle commented 3 years ago

https://github.com/eppye-bots/bots/blob/3277cc10b8a91c8a3dc0722d95dd1c4da8e7696a/bots/inmessage.py#L185-L186

Bots has numeric precision limits because of for example the code above. To validate fixed numeric input, the value is cast from a string to a float: in case of an error it is logged and processing stops, otherwise the value is formatted back as a string with the original amount of digits.

Impact:

eppye-bots commented 3 years ago

to have a SSCC as numeric.....well......even the GS1 people advise to store this an alphanumeric thing. Yes, it is true that SSCC has only numeric characters.....but this is considered a different thing.... So: bots interpreters 'numeric' as integer or real......IMHO a fairly normal interpretation. Yes, one can think SSCC is numeric.......do not.....it is a mistake.......it just has numeric characters.... a 'real' has numeric characters.....plus a minus-sign.....and decimal point.....those are not numerical....get the point? different concepts. so: do not store SSCC as numeric in a database.....nobody does...... I do know it is confusing.

mrkazoodle commented 3 years ago

Good morning Henk-Jan,

I would definitely try to store SSCC codes as 64 bit integers in a database: the maximum signed value of a Java long for example is 9,223,372,036,854,775,807. A 18 digit integer does fit perfectly, and we have 64 bit hardware for a long time. A string version would need 18 bytes/chars, which is more than double the requirement for a long.

Also, string comparison normally tries to optimise by comparing length of the string, which is always 18, so no luck there. An SSCC code starts with a prefix and the company prefix of at least 7 digits so that +99% of the SSCC strings you'd expect to find will have the same first 8 up to 12 digits. This means that string comparison is again not optimal: even if can compare 8 digits/characters in the same CPU cycle, you would expect to never encounter a difference in the first cycle.

But that is not the point.

The following (very similar) code is said to be efficient

def is_number(s): try: float(s) return True except ValueError: return False From https://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float

So maybe cast to decimal as an alternative? Or just check if it is a number without using the float to go back to string (not overwriting the original 'value')?