Open Lestropie opened 1 year ago
Very exciting to see this! The npy format would be great. If you go this route it will make reading in python extremely easy, and for reading in R we would use the RcppCNPy library. Based on this package's documentation I'm not sure how flexible it is in handling python dictionaries stored in npy files. How would you feel about keeping the metadata in a json text file? That would make the data and metadata both very easy to access in python and R.
Theoretically we could allow users to configure automated read / write of sidecar JSON data for .npy
just as we do for NIfTI. It's not as "faithful" to MRtrix3's general approach of embedding sidecar information within headers such that there's typically one file per input / output, but it's not entirely out of the ordinary (eg. .mih
has technically been doing it for a good couple of decades, though that involves explicitly stating the corresponding data file name rather than inferring correspondence from common file basenames).
Alternatively, happy to hear suggestions for other file formats for nD numerical data that ideally:
This is intended to be a bit of a centralizing thread where I can demonstrate how a few different proposed capabilities join together to form my vision of how I'd like to see the handling of fixel data change going forward, as well as its potential relevance to external projects.
1.
.mif
is sub-optimal for fixel dataIn the development of the fixel data directory format and its utilisation in the FBA implementation, the
.mif
format was used for 1D fixel data files, since it:However:
mrview
, which makes no sense.mif
data and therefore access fixel data, it's not necessarily "native" to such projectsEdit: Some of the annoyances here are discussed in #1664, but the focus there is on improving the use of the
.mif
format for fixel data rather than superseding it.In #2437 I implemented back-end support for Python
.npy
files. This to me is a good candidate for storage of fixel data.N x 3
directions file)command_history
, though typically it's only three compulsory fields in that dictionary and I don't know whether adding other fields into it could bork other softwares attempting to read such data.Further, in #2435 I discuss how in contexts such as fixel data handling, within MRtrix3 there could be an abstraction whereby the 1D / 2D data being manipulated could be
.mif
,.npy
,.txt
/.csv
/.tsv
. This would mean that 1D / 2D fixel data could use any of these formats and would still be valid under the fixel directory format conventions, so retrospective fixel data would still be valid but prospectively alternative file formats would be acceptable (and IMO preferable).2. Memory representation of GLM data
Couple of separate points in this one:
2.1. Scratch allocation of all fixel data
fixelcfestats
shares much of its command-line interface and internal code structure with other MRtrix3 statistical inference commands. This includes:fixelcfestats
) and load that data into scratch memory. In addition to being generalized across the different commands, this is also used in the scenario of element-wise design matrix factors, which are requested on demand as each element is processed. For each element-wise regressor, an input file is provided with a list of filesystem paths, just as is done for the GLM inputs.A disadvantage here is that if the experimental design is exceptionally large (eg. many fixels, many inputs), the scratch storage space of those data may become non-negligible. It requires a very big experiment for this to become a problem, but it's nevertheless feasible.
It would therefore be preferable in this instance to have the input data to
fixelcfestats
be in a form that it can be immediately memory-mapped.2.2. Natural extra dimension of statistical inference data
For fixel data, that is represented as 1D per model input, the totality of the model data is 2D.
Similarly, for voxel data you have a 3D image per input and therefore the totality of the input is 4D, and for connectome data you have a 2D matrix per input and so the totality of the data is 3D. Now in both of these cases the data are further manipulated in preparation for the GLM (each is vectorized into a 1D stripe of data per input), and so that will likely need to be done in RAM regardless, so I'll focus here on fixel data exclusively.
2.3. Possible implementation
If one wanted to fully encapsulate all GLM data, particularly for fixel data, this is just concatenation of 1D fixel data across the second dimension.
mrcat
since the.mif
format is used. For something like.npy
, this could be done very easily in Python; or for thosemr*
commands where such manipulations are well-posed for 1D / 2D data (eg. yes formrcat
, but not formrtransform
), those commands could be abstracted to permit operation on such non-"image" formats.(Note that all of the above applies to element-wise regressors in addition to the main GLM input)
Therefore, if one ties together points 1. and 2. above, what I would envisage in the future is that:
fixelcfestats
would be a 2D.npy
file, and could be memory-mapped in place.npy
format.npy
formatThis would go quite some way to disentangling the representation of fixel data from the MRtrix3 software and its
.mif
image format, particularly since the index file can be stored in NIfTI-1.I am curious to hear thoughts from both @MRtrix3/mrtrix3-devs and external invested parties, in particular those over at https://github.com/PennLINC/ModelArray (tagging listed contributors @zhao-cy @TinasheMTapera @mattcieslak @scovitz).