Closed pjfanning closed 1 year ago
Hmmh. Whether Avro Schemas allow unlimited nesting... I am not 100% sure.
I think it's a good idea to see if a test can be created; and if not maybe postpone until someone actually reports the problem.
https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-76652/year-2022/Apache-Avro.html -- seems that Avro's own impls have had CVEs reported. May not be related to problematic inputs.
@pjfanning Too bad descriptions do not really show much about actual issue, wrt implementation. Seems more likely some validation of things like encoded lengths is not done.
Ok, I think Avro does allow recursive types; f.ex this
https://kb.databricks.com/en_US/data-sources/recursive-references-in-avro-schema-are-not-allowed
suggest that simple self-references are allowed. I think same goes for Protobuf as well. But being schema-based, I think that potential for DoS is lower because attacker must target specific schema type. Still, I assume tree-style structures are fairly common so maybe it makes sense to add checks, given that it is not a ton of code (more so for Avro just to expose IOException but otherwise).
@cowtowncoder I'm happy to merge this as is but I'm not sure I have the Avro expertise to produce a test case. I think you are right that having sensible AvroSchemas should protect users anyway.
The gain from adding this PR is that the exceptions are exposed. The implementation can be changed if anyone can provide a real use case that the code doesn't protect against. The values are configurable, so if someone gets a StreamConstaintsException and they still think the input is valid - they can increase the limits by tuning the StreamReadConstraints.
The potential loss is that this PR could turn out to have bugs in it that mean StreamReadConstraints tuning won't help.