23andMe / Yamale

A schema and validator for YAML.
MIT License
670 stars 88 forks source link

Error when dealing with duplicate anchors #200

Open ghost opened 2 years ago

ghost commented 2 years ago

Hello and thank you for this tool.

Consider the following sample YAML file:

test:
  - name: &name foo
    properties:
      name: *name

  - name: &name bar
    properties:
      name: *name

Note the two anchors with the same name, which is fine according to the YAML specification. Here is a schema to validate this file:

test: list(include('value'))
value:
  name: str()
  properties:
    name: str()

Running Yamale with default options produces the following error:

yaml.composer.ComposerError: found duplicate anchor; first occurrence
  in "test.yaml", line 3, column 11
second occurrence
  in "test.yaml", line 7, column 11

Same behavior with --parser ruamel but the exception is ruamel.yaml.composer.ComposerError.

As described here, ruamel.yaml can parse this correctly, but pure=True must be specified when initializing it:

from ruamel.yaml import YAML
yaml = YAML(typ='safe', pure=True)

Can you please consider adding a flag to toggle the pure option in ruamel so Yamale doesn't error out in this scenario?

mildebrandt commented 2 years ago

Hi, thanks for your interest in Yamale.

I think I'd like to handle this by allowing users to pass in their own parser. Take a look at this file: https://github.com/23andMe/Yamale/blob/master/yamale/readers/yaml_reader.py

In your case, you'd create a file/module/package/whatever that has this function:

def ruamel_pure(f):
    from ruamel.yaml import YAML
    yaml = YAML(typ='safe', pure=True)
    return list(yaml.load_all(f))

Perhaps that's in a package called romain-depres.prasers. Then the command like would look like:

yamale -p romain-depres.prasers.ruamel_pure -s schema.yaml data.yaml

Then in here:

def parse_yaml(path=None, parser='pyyaml', content=None):
    try:
        parse = _parsers[parser.lower()]
    except KeyError:
        raise NameError('Parser "' + parser + '" is not supported\nAvailable parsers are listed below:\nPyYAML\nruamel')

Before raising the NameError, we can try to load the module/function that is passed into the parser argument by the -p option. If it exists, then use that as the parser.

I'm happy to review any pull request that will implement what I've mentioned above. Let me know if you have any questions.