frejanordsiek / hdf5storage

Python package to read and write a wide range of Python types to/from HDF5 formatted files. Can read/write data to the HDF5 based Matlab v7.3 MAT files.
BSD 2-Clause "Simplified" License
83 stars 24 forks source link

Adding support for circular objects #108

Open frejanordsiek opened 3 years ago

frejanordsiek commented 3 years ago

At present, circular objects (e.g. a list with an element that is itself) are not properly detected and the package will keep recursing deeper and deeper in the circular structure until either the recursion limit is reached, the stack is blown, until it runs out of RAM, or it runs out of disk space (if writing). The lack of handling is a bug since it is input that the package can't handle in any way (even rejection).

Circular object detection and handling would prevent this problem, though the prevention does cost some CPU time and RAM to keep track of the already written/read objects.

frejanordsiek commented 3 years ago

After making an initial failed attempt, it looks like this is going to require a massive re-architecturing of the Marshaller API. Marshalling is going to have to some degree be two pass instead of single pass.

frejanordsiek commented 3 years ago

I've been thinking about this more and I think it is actually a bad idea to have write support for circular objects because earlier versions of this software would crash on them and it is highly likely that similar packages on other ecosystems and Matlab would crash on them.

However, read support is a good thing and the write system does still need to detect circular objects in order to reject them instead of crashing.