23andMe / Yamale

A schema and validator for YAML.
MIT License
679 stars 88 forks source link

Questions about representation of numbers #185

Closed ilia-wolke7 closed 2 years ago

ilia-wolke7 commented 2 years ago

Dear devs, Example schema: number1:int() number2:float()

1.Why 1e4 or 1.1e4 are not recognized as valid int. Should be possible imho.

  1. Why build in float validator brings error on numbers without decimal point. I.e. 10.0 is valid but 10 is not valid double. Seems illogical, since ppl never add "." to such numbers

I have created own validator, which did the job. But may be support is natively? Thanks you very much for great tool!

`#custom type int64 class int64_type_validator(Validator): """ Custom Int64 validator """ tag = 'int64'
def _is_valid(self, value): strg=str(value) try: f=float(value) i=np.int64(f) return (f-i)==0

Notes:

        # TBD:May be add epsilon? 
        # Is 10.1e9 valid input of int64. So far yes. or do we want to limit AeB
    except:
      return False

class float_type_validator(Validator): """ Custom float validator """ tag = 'float'

def _is_valid(self, value):
    strg=str(value)
    try:
        f:float=float(strg);
        return True          
    except:
        return False`
mildebrandt commented 2 years ago

Hi, thanks for using Yamale.

Yamale validates based on what the Python parser returns. The pyyaml parser has an open issue around exponential notation: https://github.com/yaml/pyyaml/issues/173

You can switch to ruamel to support that use case: https://github.com/23andMe/Yamale#command-line

The ruamel parser will convert the exponential notation into a float, so be sure you validate with float.

For your other concern, integers and floats are different types. Because an integer can be cast or converted to a float does not make it a float. So in your case 10 is an integer, it's not a float. If you don't care if it's a float or an int, you can use the num() validator: https://github.com/23andMe/Yamale#command-line

So...if I'm reading your inquiry correctly, it sounds like moving to ruamel and using the num() validator would mirror your custom validator.

ilia-wolke7 commented 2 years ago

Thank you very much, what is the proper validator for int64? int() is 32 only, right?

mildebrandt commented 2 years ago

For any Python version in the past 20 years or so, int is the same as long....so you don't need to worry about it: https://www.python.org/dev/peps/pep-0237/

ilia-wolke7 commented 2 years ago

How do I set parser to rumel if I use python interface, i.e. data = yamale.make_data(file_path) yamale.validate(schema, data)

mildebrandt commented 2 years ago

Both make_data() and make_schema() have a parser parameter that you can set. There's an example in the README here: https://github.com/23andMe/Yamale#api

# Import Yamale and make a schema object, make sure ruamel.yaml is installed already.
import yamale
schema = yamale.make_schema('./schema.yaml', parser='ruamel')

# Create a Data object
data = yamale.make_data('./data.yaml', parser='ruamel')

# Validate data against the schema same as before.
yamale.validate(schema, data)
ilia-wolke7 commented 2 years ago

Even with ruamel int in scientific notation has a problem Following schema element:
internal_link_speed: 10e9 brings error: nodes.internal_link_speed: '10000000000.0' is not a int. Why int can't be written in scientific notation? Is it, what you would expect?

ilia-wolke7 commented 2 years ago

Dear @mildebrandt , Following empty list seems to fail, why?

list: list(include('id_type'),min=0) also tried: list: list(include('id_type'),none=True)

But list without elements list: brings error: list: 'None' is not a list.

How do I solve this.

mildebrandt commented 2 years ago

Right, I mentioned the conversion to float above:

The ruamel parser will convert the exponential notation into a float, so be sure you validate with float.

As for the list, the parser doesn't know your empty value is supposed to map to an empty list. It just sees a missing value...which in Python is best represented by None. If you want an empty list in yaml, you can write it like this:

empty_list: []
ilia-wolke7 commented 2 years ago

Thank you very much for your help. Regarding int and exp notation imho it is weakness of the tool (parser?). int validator should support exponential representation, because it is a normal case. Integers can be in scientific notation too. I have created my own validator so far (see above ). Thank you for your great help!

mildebrandt commented 2 years ago

Yamale leaves the parsing up to the Python library being used. Yamale validates the values based on the Python types which the parser assigned. I don't see it as a weakness of either the parser (ruamel) or the validator (yamale). Both try to be explicit in what they do. The ruamel parser assigns 1e4 to a float, and yamale correctly says a float is not an integer. For example, I'd also expect 1.0 to fail a validation if an integer was required since 1.0 is a float.

I can see why someone would think the other way. You're saying that if I can cast/convert a float to an integer successfully, then yamale should validate as an integer. I could also cast 1.0 as a string....but one would expect the validator to fail if the user provided 1.0 and the schema expected a string instead. For that reason, yamale validates against the actual types that the parser gives it instead of what it can be cast/converted into.