crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.45k stars 1.62k forks source link

Add strict mode for YAML parser to fail on duplicate attributes #9284

Open straight-shoota opened 4 years ago

straight-shoota commented 4 years ago

Per YAML specification duplicate attributes in a mapping is invalid. Our parser doesn't error on that and instead reads both, which means YAML.parse picks the last one:

require "yaml"
YAML.parse <<-YAML # => {"foo" => "baz"}
  foo: bar
  foo: baz
  YAML

That's certainly an appropriate behaviour for some use cases where you don't want the entire thing to explode. But it can very be unexpected behaviour and can lead to weird bugs. Being more strict would probably be helpful in many situations because it alerts that something needs fixing.

So I'd like to have at least an option to use a strict mode which errors on duplicate attributes.

I'm not sure if it should be configurable (in which should be the default) or strict be the only way.

The JSON specification doesn't say anything about duplicates, so it's safe to assume that they're technically valid. It still doesn't make much sense, especially when a mapping is deserialized into a data structure that only supports unique keys.

/cf https://github.com/crystal-lang/shards/pull/387

asterite commented 4 years ago

I think I tried to implement it before. It worked. Then everything was failing because when you use << you copy some keys from another node and then override some key value pairs.

What does Ruby do? Any YAML library out there that errors on duplicate keys?

asterite commented 4 years ago

In Ruby the last value is returned, and there's no way to error on duplicates.

straight-shoota commented 4 years ago

Yeah, it also won't work on the low parser level because the interpretation of value equality depends on the schema. For example: are foo and "foo" identical?

But at the serialization stage, this is definitely doable. serde for example errors on duplicate keys. Golang has UnmarshalStrict. There are several implementations that error by default (js-yaml, snakeyaml, ruamel.yaml).

naqvis commented 4 years ago

Golang yaml v2 package behaves similar to current crystal implementation (i.e it will silently use the last value read), but v3 triggers error citing the key already defined at line number. I will vote that we should stick to specification and should also trigger error for duplicates.