FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

Ensure `IonReader` instances created within `IonFactory` are always resource-managed #325

Closed tgregg closed 2 years ago

tgregg commented 2 years ago

This is intended to fix the memory leak issue described in #306, superseding that pull request.

This solution ensures that IonReader instances created by IonFactory are marked as "resource-managed", meaning that they are closed during IonParser.close(). Closing an IonReader causes it to close any resources it creates (e.g. a GZIPInputStream, as was the case in the original PR).

Note: IonFactory/IonParser currently ignores the StreamReadFeature.AUTO_CLOSE_SOURCE feature, which allows users to opt-in to disabling closure of resources (e.g. an InputStream) provided by the user. Before this change, user-provided InputStreams and Readers would not be closed by IonParser.close() because the IonReader created to wrap the resources was never closed. Now that that IonReader is closed, those user-provided resources will be closed transitively via IonReader.close(). Although this is technically a behavioral change, it seems unlikely to be harmful, as resuming parsing of a partially-consumed stream of Ion data without any context is only possible in specific circumstances. It seems far more likely that current users are either 1) manually closing these resources, in which case this change leads to a redundant close (not harmful in the InputStream/Reader implementations I'm aware of), or 2) forgetting to close these resources, which may leak memory, depending on the implementation. Option 2) is even more likely for users used to JsonFactory, where AUTO_CLOSE_SOURCE is (as far as I can tell) on by default. Therefore, I conclude that the benefits of this change outweigh the risks; however, I'm open to other opinions.

This solution isn't very aesthetically-pleasing, as it discards and replaces the IOContext passed in by the JsonFactory superclass. This amounts to one redundant IOContext instantiation per createParser call. This could probably be avoided, but it would require a larger refactor.

The included unit test verifies that IonReaders are always marked as resource-managed except when provided directly by the user. The test assumes correct behavior of IonParser.close(); namely, that resource-managed resources that are Closeable will be closed. Before this fix, the BYTE_ARRAY, CHAR_ARRAY, READER, and INPUT_STREAM cases failed.

cowtowncoder commented 2 years ago

Sounds reasonable to me.

But I'd like to get a +1 from Ion maintainers. @mcliedtke @jobarr-amzn WDYT? (also note to self: need to add actual active maintainer list somewhere, maybe pom.xml and README(s))

cowtowncoder commented 2 years ago

@jobarr-amzn Thanks! While we are discussing maintainers -- I'd be happy to add links to devs you think would qualify as co-authors for the Ion format module, from release-notes/VERSION-2.x. Both for others and just for myself so I can check who to cc (with so many modules and relatively low volume here I tend to forget this over time :) ).

cowtowncoder commented 2 years ago

@tgregg Just let me know when you are happy with this and I will merge it -- timing is great since we still have time until 2.14.

tgregg commented 2 years ago

@cowtowncoder I incorporated @jobarr-amzn's feedback, so this is ready to merge from our perspective. Thanks for your help.

Feel free to add me to the list of contributors you're compiling.

cowtowncoder commented 2 years ago

Thank you @tgregg ! I did add you as a maintainer and happy to add anyone else who wants to volunteer. There isn't anything specific you need to do of course, it's more information so I know who to occasionally refer from issues et.

Will now merge this.

cowtowncoder commented 2 years ago

Hmmh. I should have read this more closely... it does expose a problem b/w problem of closing (or not) of the underlying input source and IonReader, and general Jackson contract -- wherein InputStream is not managed (even if IonReader should be since caller has no access). But at least there are unit tests to now codify expectation.

Also, need to figure out how to merge to master/3.0 as there's significant refactoring.

cowtowncoder commented 2 years ago

Was able to merge fine to master/3.0, with some tweaks, test passing.