Closed ejolly closed 8 months ago
Seems to be working for me. let's go ahead and merge. (also not sure why my previous comments aren't showing up to do the same thing).
Ok @ljchang I did some formal testing about trying to read in h5 files made with the current main
version of nltools and this new proposed h5 fix, and it doesn't work 🙁 So if we merge this in then any h5 files created with previous versions of nltools will need old versions of nltools to be loaded, rendering them basically useless over time.
The reason seems to be that the way the deepdish
does saving/loading is very different and more complicated from how h5py
does which is pretty much like opening any other file: with H5File('brain.h5', 'r'):
As a result this new PR isn't backwards compatible. How do you want to proceed? A few thoughts:
That's a bummer @ejolly . What specifically isn't working? I have tried loading older hdf5 files from the FNL dataset and they seem to be loading okay for me. Though I don't have any .X or .Y attributes stored in them.
Yea it's primarily the X and Y attributes. We can't load those easily.
ok, I think that is because they are using pytables to deal with the pandas stuff.
I have a couple of thoughts.
1) how hard does it look to try and emulate this aspect of deepdish? 2) how big of a deal is it if we don't support older hdf5 files that contain .X or .Y info? pickle has not done this (super annoying BTW). I think the pure numpy ones should be fine. We could just throw an error or warning if .X or .Y info is detected on the newer version that you will need to load this using an older version of nltools. I have no idea how often people even use this functionality. I personally never do because I'm usually just trying to make a faster version of a nifti file and those don't store this type of metadata anyway.
@ljchang I was able to solve this now and verified it works across different nltools version! Brain_Data
now accepts a legacy_h5
kwarg that when set to True
will use pytables
to mimic how deepdish was loading files. Otherwise it will use h5py
which is the new default going forward.
The only thing we lose when loading h5 files from older versions of nltools is the column-type data for the X
and Y
attributes. Everything becomes at object
type but I think that's probably acceptable and the best we can do.
I added a test for this and the last thing left to do when I get a chance is add support for legacy Adjacency
hdf5 files.
awesome! nice work!
Legacy h5 support for Adjacency
is now implemented. One thing to note is that new hdf5 files (for brain or adjacency) created with these updates are a bit larger than those created with deepdish
because of the default compression (zlib
) deepdish
was using. h5py
supports gzip
natively which I have implemented as the default, but it might be worth looking into this helper library for further compression support: https://pypi.org/project/hdf5plugin/
Tests are only failing on python 3.7 which is fine because we're deprecating support for that version (and adding up to 3.11)
Ok @ljchang I took a crack at replacing deepdish for saving and loading
.h5
Brain_Data
andAdjacency
objects, which addresses #415. Feel free to play around with this PR and if nothing too strange is happening we can marge it in!