23andMe / Yamale

A schema and validator for YAML.
MIT License
666 stars 88 forks source link
python schema yaml

Yamale (ya·ma·lē)

:warning: Ensure that your schema definitions come from internal or trusted sources. Yamale does not protect against intentionally malicious schemas.
Yamale

A schema and validator for YAML.

What's YAML? See the current spec here and an introduction to the syntax here.

Build Status PyPI

Requirements

Install

pip

$ pip install yamale

NOTE: Some platforms, e.g., Mac OS, may ship with only Python 2 and may not have pip installed. Installation of Python 3 should also install pip. To preserve any system dependencies on default software, consider installing Python 3 as a local package. Please note replacing system-provided Python may disrupt other software. Mac OS users may wish to investigate MacPorts, homebrew, or building Python 3 from source; in all three cases, Apple's Command Line Tools (CLT) for Xcode may be required. See also developers, below.

Manual

  1. Download Yamale from: https://github.com/23andMe/Yamale/archive/master.zip
  2. Unzip somewhere temporary
  3. Run python setup.py install (may have to prepend sudo)

Usage

Command line

Yamale can be run from the command line to validate one or many YAML files. Yamale will search the directory you supply (current directory is default) for YAML files. Each YAML file it finds it will look in the same directory as that file for its schema, if there is no schema Yamale will keep looking up the directory tree until it finds one. If Yamale can not find a schema it will tell you.

Usage:

usage: yamale [-h] [-s SCHEMA] [-n CPU_NUM] [-p PARSER] [--no-strict] [PATH]

Validate yaml files.

positional arguments:
  PATH                  folder to validate. Default is current directory.

optional arguments:
  -h, --help            show this help message and exit
  -s SCHEMA, --schema SCHEMA
                        filename of schema. Default is schema.yaml.
  -n CPU_NUM, --cpu-num CPU_NUM
                        number of CPUs to use. Default is 4.
  -p PARSER, --parser PARSER
                        YAML library to load files. Choices are "ruamel" or
                        "pyyaml" (default).
  --no-strict           Disable strict mode, unexpected elements in the data
                        will be accepted.

API

There are several ways to feed Yamale schema and data files. The simplest way is to let Yamale take care of reading and parsing your YAML files.

All you need to do is supply the files' path:

# Import Yamale and make a schema object:
import yamale
schema = yamale.make_schema('./schema.yaml')

# Create a Data object
data = yamale.make_data('./data.yaml')

# Validate data against the schema. Throws a ValueError if data is invalid.
yamale.validate(schema, data)

You can pass a string of YAML to make_schema() and make_data() instead of passing a file path by using the content= parameter:

data = yamale.make_data(content="""
name: Bill
age: 26
height: 6.2
awesome: True
""")

If data is valid, nothing will happen. However, if data is invalid Yamale will throw a YamaleError with a message containing all the invalid nodes:

try:
    yamale.validate(schema, data)
    print('Validation success! 👍')
except ValueError as e:
    print('Validation failed!\n%s' % str(e))
    exit(1)

and an array of ValidationResult.

try:
    yamale.validate(schema, data)
    print('Validation success! 👍')
except YamaleError as e:
    print('Validation failed!\n')
    for result in e.results:
        print("Error validating data '%s' with '%s'\n\t" % (result.data, result.schema))
        for error in result.errors:
            print('\t%s' % error)
    exit(1)

You can also specify an optional parser if you'd like to use the ruamel.yaml (YAML 1.2 support) instead:

# Import Yamale and make a schema object, make sure ruamel.yaml is installed already.
import yamale
schema = yamale.make_schema('./schema.yaml', parser='ruamel')

# Create a Data object
data = yamale.make_data('./data.yaml', parser='ruamel')

# Validate data against the schema same as before.
yamale.validate(schema, data)

Schema

:warning: Ensure that your schema definitions come from internal or trusted sources. Yamale does not protect against intentionally malicious schemas.

To use Yamale you must make a schema. A schema is a valid YAML file with one or more documents inside. Each node terminates in a string which contains valid Yamale syntax. For example, str() represents a String validator.

A basic schema:

name: str()
age: int(max=200)
height: num()
awesome: bool()

And some YAML that validates:

name: Bill
age: 26
height: 6.2
awesome: True

Take a look at the Examples section for more complex schema ideas.

Includes

Schema files may contain more than one YAML document (nodes separated by ---). The first document found will be the base schema. Any additional documents will be treated as Includes. Includes allow you to define a valid structure once and use it several times. They also allow you to do recursion.

A schema with an Include validator:

person1: include('person')
person2: include('person')
---
person:
    name: str()
    age: int()

Some valid YAML:

person1:
    name: Bill
    age: 70

person2:
    name: Jill
    age: 20

Every root node not in the first YAML document will be treated like an include:

person: include('friend')
group: include('family')
---
friend:
    name: str()
family:
    name: str()

Is equivalent to:

person: include('friend')
group: include('family')
---
friend:
    name: str()
---
family:
    name: str()
Recursion

You can get recursion using the Include validator.

This schema:

person: include('human')
---
human:
    name: str()
    age: int()
    friend: include('human', required=False)

Will validate this data:

person:
    name: Bill
    age: 50
    friend:
        name: Jill
        age: 20
        friend:
            name: Will
            age: 10
Adding external includes

After you construct a schema you can add extra, external include definitions by calling schema.add_include(dict). This method takes a dictionary and adds each key as another include.

Strict mode

By default Yamale will provide errors for extra elements present in lists and maps that are not covered by the schema. With strict mode disabled (using the --no-strict command line option), additional elements will not cause any errors. In the API, strict mode can be toggled by passing the strict=True/False flag to the validate function.

It is possible to mix strict and non-strict mode by setting the strict=True/False flag in the include validator, setting the option only for the included validators.

Validators

Here are all the validators Yamale knows about. Every validator takes a required keyword telling Yamale whether or not that node must exist. By default every node is required. Example: str(required=False)

You can also require that an optional value is not None by using the none keyword. By default Yamale will accept None as a valid value for a key that's not required. Reject None values with none=False in any validator. Example: str(required=False, none=False).

Some validators take keywords and some take arguments, some take both. For instance the enum() validator takes one or more constants as arguments and the required keyword: enum('a string', 1, False, required=False)

String - str(min=int, max=int, equals=string, starts_with=string, ends_with=string, matches=regex, exclude=string, ignore_case=False, multiline=False, dotall=False)

Validates strings.

Examples:

Regex - regex([patterns], name=string, ignore_case=False, multiline=False, dotall=False)

Validates strings against one or more regular expressions.

Examples:

Integer - int(min=int, max=int)

Validates integers.

Number - num(min=float, max=float)

Validates integers and floats.

Boolean - bool()

Validates booleans.

Null - null()

Validates null values.

Enum - enum([primitives])

Validates from a list of constants.

Examples:

Day - day(min=date, max=date)

Validates a date in the form of YYYY-MM-DD.

Examples:

Timestamp - timestamp(min=time, max=time)

Validates a timestamp in the form of YYYY-MM-DD HH:MM:SS.

Examples:

List - list([validators], min=int, max=int)

Validates lists. If one or more validators are passed to list() only nodes that pass at least one of those validators will be accepted.

Examples:

Map - map([validators], key=validator, min=int, max=int)

Validates maps. Use when you want a node to contain freeform data. Similar to List, Map takes one or more validators to run against the values of its nodes, and only nodes that pass at least one of those validators will be accepted. By default, only the values of nodes are validated and the keys aren't checked.

Examples:

IP Address - ip()

Validates IPv4 and IPv6 addresses.

Examples:

MAC Address - mac()

Validates MAC addresses.

Examples:

SemVer (Semantic Versioning) - semver()

Validates Semantic Versioning strings.

Examples:

Any - any([validators])

Validates against a union of types. Use when a node must contain one and only one of several types. It is valid if at least one of the listed validators is valid. If no validators are given, accept any value.

Examples:

Subset - subset([validators], allow_empty=False)

Validates against a subset of types. Unlike the Any validator, this validators allows one or more of several types. As such, it automatically validates against a list. It is valid if all values can be validated against at least one validator.

Examples:

Include - include(include_name)

Validates included structures. Must supply the name of a valid include.

Examples:

Custom validators

It is also possible to add your own custom validators. This is an advanced topic, but here is an example of adding a Date validator and using it in a schema as date()

import yamale
import datetime
from yamale.validators import DefaultValidators, Validator

class Date(Validator):
    """ Custom Date validator """
    tag = 'date'

    def _is_valid(self, value):
        return isinstance(value, datetime.date)

validators = DefaultValidators.copy()  # This is a dictionary
validators[Date.tag] = Date
schema = yamale.make_schema('./schema.yaml', validators=validators)
# Then use `schema` as normal

Examples

:warning: Ensure that your schema definitions come from internal or trusted sources. Yamale does not protect against intentionally malicious schemas.

Using keywords

Schema:

optional: str(required=False)
optional_min: int(min=1, required=False)
min: num(min=1.5)
max: int(max=100)

Valid Data:

optional_min: 10
min: 1.6
max: 100

Includes and recursion

Schema:

customerA: include('customer')
customerB: include('customer')
recursion: include('recurse')
---
customer:
    name: str()
    age: int()
    custom: include('custom_type')

custom_type:
    integer: int()

recurse:
    level: int()
    again: include('recurse', required=False)

Valid Data:

customerA:
    name: bob
    age: 900
    custom:
        integer: 1
customerB:
    name: jill
    age: 1
    custom:
        integer: 3
recursion:
    level: 1
    again:
        level: 2
        again:
            level: 3
            again:
                level: 4

Lists

Schema:

list_with_two_types: list(str(), include('variant'))
questions: list(include('question'))
---
variant:
  rsid: str()
  name: str()

question:
  choices: list(include('choices'))
  questions: list(include('question'), required=False)

choices:
  id: str()

Valid Data:

list_with_two_types:
  - 'some'
  - rsid: 'rs123'
    name: 'some SNP'
  - 'thing'
  - rsid: 'rs312'
    name: 'another SNP'
questions:
  - choices:
      - id: 'id_str'
      - id: 'id_str1'
    questions:
      - choices:
        - id: 'id_str'
        - id: 'id_str1'

The data is a list of items without a keyword at the top level

Schema:

list(include('human'), min=2, max=2)

---
human:
  name: str()
  age: int(max=200)
  height: num()
  awesome: bool()

Valid Data:

- name: Bill
  age: 26
  height: 6.2
  awesome: True

- name: Adrian
  age: 23
  height: 6.3
  awesome: True

Writing Tests

To validate YAML files when you run your program's tests use Yamale's YamaleTestCase

Example:

class TestYaml(YamaleTestCase):
    base_dir = os.path.dirname(os.path.realpath(__file__))
    schema = 'schema.yaml'
    yaml = 'data.yaml'
    # or yaml = ['data-*.yaml', 'some_data.yaml']

    def runTest(self):
        self.assertTrue(self.validate())

base_dir: String path to prepend to all other paths. This is optional.

schema: String of path to the schema file to use. One schema file per test case.

yaml: String or list of yaml files to validate. Accepts globs.

Developers

Linting + Formatting

Yamale is formatted with ruff. There is a github action enforcing ruff formatting and linting rules. You can run this locally via make lint or by installing the pre-commit hooks via make install-hooks

Testing

Yamale uses Tox to run its tests against multiple Python versions. To run tests, first checkout Yamale, install Tox, then run make test in Yamale's root directory. You may also have to install the correct Python versions to test with as well.

NOTE on Python versions: tox.ini specifies the lowest and highest versions of Python supported by Yamale. Unless your development environment is configured to support testing against multiple Python versions, one or more of the test branches may fail. One method of enabling testing against multiple versions of Python is to install pyenv and tox-pyenv and to use pyenv install and pyenv local to ensure that tox is able to locate appropriate Pythons.

Releasing

Yamale uses Github Actions to upload new tags to PyPi. To release a new version:

  1. Make a commit with the new version number in yamale/VERSION.
  2. Run tests for good luck.
  3. Run make release.

Github Actions will take care of the rest.