MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
291 stars 179 forks source link

Open discussion RE fixel data handling #2551

Open Lestropie opened 1 year ago

Lestropie commented 1 year ago

This is intended to be a bit of a centralizing thread where I can demonstrate how a few different proposed capabilities join together to form my vision of how I'd like to see the handling of fixel data change going forward, as well as its potential relevance to external projects.


1. .mif is sub-optimal for fixel data

In the development of the fixel data directory format and its utilisation in the FBA implementation, the .mif format was used for 1D fixel data files, since it:

However:

Edit: Some of the annoyances here are discussed in #1664, but the focus there is on improving the use of the .mif format for fixel data rather than superseding it.

In #2437 I implemented back-end support for Python .npy files. This to me is a good candidate for storage of fixel data.

Further, in #2435 I discuss how in contexts such as fixel data handling, within MRtrix3 there could be an abstraction whereby the 1D / 2D data being manipulated could be .mif, .npy, .txt / .csv / .tsv. This would mean that 1D / 2D fixel data could use any of these formats and would still be valid under the fixel directory format conventions, so retrospective fixel data would still be valid but prospectively alternative file formats would be acceptable (and IMO preferable).

2. Memory representation of GLM data

Couple of separate points in this one:

2.1. Scratch allocation of all fixel data

fixelcfestats shares much of its command-line interface and internal code structure with other MRtrix3 statistical inference commands. This includes:

A disadvantage here is that if the experimental design is exceptionally large (eg. many fixels, many inputs), the scratch storage space of those data may become non-negligible. It requires a very big experiment for this to become a problem, but it's nevertheless feasible.

It would therefore be preferable in this instance to have the input data to fixelcfestats be in a form that it can be immediately memory-mapped.

2.2. Natural extra dimension of statistical inference data

For fixel data, that is represented as 1D per model input, the totality of the model data is 2D.

Similarly, for voxel data you have a 3D image per input and therefore the totality of the input is 4D, and for connectome data you have a 2D matrix per input and so the totality of the data is 3D. Now in both of these cases the data are further manipulated in preparation for the GLM (each is vectorized into a 1D stripe of data per input), and so that will likely need to be done in RAM regardless, so I'll focus here on fixel data exclusively.

2.3. Possible implementation

If one wanted to fully encapsulate all GLM data, particularly for fixel data, this is just concatenation of 1D fixel data across the second dimension.

(Note that all of the above applies to element-wise regressors in addition to the main GLM input)


Therefore, if one ties together points 1. and 2. above, what I would envisage in the future is that:

This would go quite some way to disentangling the representation of fixel data from the MRtrix3 software and its .mif image format, particularly since the index file can be stored in NIfTI-1.

I am curious to hear thoughts from both @MRtrix3/mrtrix3-devs and external invested parties, in particular those over at https://github.com/PennLINC/ModelArray (tagging listed contributors @zhao-cy @TinasheMTapera @mattcieslak @scovitz).

mattcieslak commented 1 year ago

Very exciting to see this! The npy format would be great. If you go this route it will make reading in python extremely easy, and for reading in R we would use the RcppCNPy library. Based on this package's documentation I'm not sure how flexible it is in handling python dictionaries stored in npy files. How would you feel about keeping the metadata in a json text file? That would make the data and metadata both very easy to access in python and R.

Lestropie commented 1 year ago

Theoretically we could allow users to configure automated read / write of sidecar JSON data for .npy just as we do for NIfTI. It's not as "faithful" to MRtrix3's general approach of embedding sidecar information within headers such that there's typically one file per input / output, but it's not entirely out of the ordinary (eg. .mih has technically been doing it for a good couple of decades, though that involves explicitly stating the corresponding data file name rather than inferring correspondence from common file basenames).

Alternatively, happy to hear suggestions for other file formats for nD numerical data that ideally: