IO module - Githubissues

balbasty commented 3 years ago

This is still a work-in-progress (hence the "draft" mode). The idea is to merge the different io tools in a consistent API.

The main class (MappedArray) is very much inspired by nibabel.SpatialImage: it links to a file on disk and allows its data and metadata to be loaded (although the data is not loaded in memory by default). In nibabel.SpatialImage, the array can be symbolically sliced (before it is even loaded) and the same principles are implemented in MappedArray with a few additional features:

operations can be chained
permutations are possible
scalar-indexing of the spatial dimensions is possible (WIP: I still need to implement the corresponding slicing of the affine matrix, which can then becomes a "3x2" affine -- i.e., it has 3 world dimensions but only 2 voxel dimensions)

A typical use case would be:

nii = BabelArray('path/to/file.nii')
z = 5
slice = nii[:, :, z]
dat = nii.fdata()
aff = nii.affine

There are two ways of loading the data:

dat = nii.data()   # load the raw data (default dtype: same as on-disk)
dat = nii.fdata()  # load the *scaled* data (default dtype: float32)

Both these functions have a number of options that implement JA's tricks (random noise, clip data below/above some percentiles, cast to another data type, etc.)

I will also implement a TiffArray class that will use the tifffile package under the hood, because I need it for my current project (microscopy data is stored as tiff).

Finally, I would like to implement in-place / partial writing as well; either to modify the header (affine) in-place or to write sub-blocks of data as was possible in SPM. The partial-reading stuff was already sort of implemented in nibabel so it was easy. The partial writing will be more tricky -- although there are a lot of utilities in nibabel that will probably make it a bit easier than from scratch.

I would also like a common API to the metadata. Maybe the user could say which fields/info the want, and None would be returned if that field does not exist in a given data format. We'd need to have some sort of dictionary that maps converts related fields between formats.

balbasty commented 3 years ago

It's becoming more and more ugly but the features are coming together.

Here's how to partial read/write a block of data (only works on non-compressed files):

nii = BabelArray('path/to/file.nii')
slice = nii[:, 40:50, 5]
scaled_dat = slice.fdata()
scaled_dat = my_processing_function(scaled_dat)
slice.set_fdata(scaled_dat)

It will work on compressed files in the near future (although under the hood it will just read all the data and write it again).

And here's how to change an affine in-place (it is written in the sform only, for the moment)

nii.set_metadata(dict(affine=torch.eye4))

The fields that are currently handled are listed here The idea is to find a set of metadata fields that (1) we are likely to overwrite (i.e., not datatype or shape -- although maybe we'd want that in some cases) and (2) we can find in most data formats we work with. Metadata are currently extracted using:

meta_dict = nii.metadata()

I am just thinking that maybe, for writing, I should use kwargs instead of giving a dictionary, i.e.,

nii.set_metadata(affine=torch.eye4)

balbasty commented 3 years ago

I think it works for Niftis and MGH files. I would need a million more tests to make sure it works (especially when I compose slicings and permutations). The question is do I merge this monster (and we'll probably have some debugging to do on master later), or do I wait to have more tests?

balbasty commented 3 years ago

I am merging it now. But I'll keep the branch open for further fixes/features/etc.

balbasty / nitorch

IO module #23