biojppm / rapidyaml

Rapid YAML - a library to parse and emit YAML, and do it fast.
MIT License
583 stars 100 forks source link

Tree::resolve() leaves some references in place in certain cases #400

Open cbirkhold opened 9 months ago

cbirkhold commented 9 months ago

Consider the following document:

a: &a
  x: 1
b: &b
  ref: *a
c:
  ref: *b

Tree:resolve() will result in:

a:
  x: 1
b:
  ref:
    x: 1
c:
  ref:
    ref: &a
      x: 1

Which is correct safe for the unintended remaining '&a' reference. This happens whenever referenced nodes contains further references. As the reference instantiation process creates copies of the entire referenced sub-tree, including any references, there are now references that are not part of the list of anchors and references created at the start of resolve() and which is used to remove them at the end of resolve() - leaving out the 'newly created' references.

Workaround: call resolve() again (this will collect the additional 'newly created' references and delete them).

biojppm commented 6 months ago

Thanks for reporting. References are notably unsafe; implementing an iterative resolve opens an attack vector to nasty attacks such as https://en.wikipedia.org/wiki/Billion_laughs_attack .

I am envisioning adding a parameter specifying the max number of resolve levels (defaulting to 1, ie equivalent to the current behavior). With this, the user will be responsible for picking the appropriate risk level.