BAMWelDX / weldx

The welding data exchange format
https://www.bam.de/weldx
BSD 3-Clause "New" or "Revised" License
19 stars 9 forks source link

`weldx.asdf.util.read_buffer` relies on reading from a 'closed' buffer #873

Closed braingram closed 1 year ago

braingram commented 1 year ago

ASDF was allowing lazily loaded arrays to be read from an AsdfFile created by reading an io.BytesIO instance. This results in unexpected changes to the position of the read/write pointer within the stream.

See https://github.com/asdf-format/asdf/issues/1539 for the relevant ASDF issue.

It appears that weldx relies upon this behavior for any tests using weldx.asdf.util.read_buffer. The relevant code is as follows: https://github.com/BAMWelDX/weldx/blob/c2ee9b1f7cb8df8b2f4723884341e9e20dcd0d50/weldx/asdf/util.py#L171-L189 Note that data is returned outside of the asdf.open context (which closes the underlying 'file' when the context exits). As lazy_load=True was not provided to asdf.open the arrays within the AsdfFile are 'lazy loaded' (their data is not read from the file until the array is sliced or in other ways accessed). One example of a test that will fail when the above issue is fixed in ASDF is test_shape_validator: https://github.com/BAMWelDX/weldx/blob/c2ee9b1f7cb8df8b2f4723884341e9e20dcd0d50/weldx/tests/asdf_tests/test_asdf_validators.py#L150-L155 Here result contains references to 'lazy loaded' arrays that hold references to the now closed file. The call to compare_nested will then attempt to load the arrays which when the issue is fixed in ASDF will result in an error.

I see one usage of read_buffer outside of tests in weldx.asdf.cli.welding_schema: https://github.com/BAMWelDX/weldx/blob/c2ee9b1f7cb8df8b2f4723884341e9e20dcd0d50/weldx/asdf/cli/welding_schema.py#L303-L306 However this does not appear to use the result of write_read_buffer.

I would imagine that one of two solutions might work:

  1. used lazy_load=True with asdf.open to force loading of all arrays when the file is opened
  2. convert read_buffer (and write_read_buffer) to context managers to allow the file to stay open while the results are used

The first option seems like the easiest but will result in all arrays being loaded into memory (which might not be necessary for every call to read_buffer).

CagtayFabry commented 1 year ago

thank you for reporting this, I like your suggestion using the context manager.

As you mentioned in your comments, the function is mostly used in our testing pipelines. I will try to look into the best way to prevent running into this for now.

braingram commented 1 year ago

Thanks! I opened a PR with updates that should be compatible with a fix to https://github.com/asdf-format/asdf/issues/1539

I looked a bit at the ASDF issue yesterday and it affects more than just BytesIO files and may require more extensive changes to fix. However I do expect that a fix for the ASDF issue should be included in the next release (currently planned as ASDF 3.0, with no estimate for when that will be released).