MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
294 stars 181 forks source link

std::ssize_t audit #2972

Open Lestropie opened 3 months ago

Lestropie commented 3 months ago

Problem identified by @daljit46.

In various parts of the C++ code base, we have erroneously used std::ssize_t as "a signed integer of the same bit width as std::size_t, when in fact by the letter of the specification it is only guaranteed to support the value -1.

daljit46 commented 3 months ago

Determine an appropriate substitute type to use (typedef one if necessary)

Eigen uses std::ptrdiff_t for indices so I guess we could use that or perhaps int64_t for non-indices.

Lestropie commented 3 months ago

std::ptrdiff_t feels wrong from its name but from my first search for a definition is actually right:

std::ptrdiff_t is used for pointer arithmetic and array indexing, if negative values are possible.

daljit46 commented 2 months ago

In the meeting yesterday, we discussed a few possible typedefs for std::ptrdiff_t:

Lestropie commented 2 months ago

Sorry I missed. We've generally stuck to lowercase for typedefs internally. I'm also wondering if typedefs that are intended for more broad use tend to be *_type rather than *_t (eg. default_type). I like "index" and "stride"; is "offset" intended to be just a difference of "index"es, or more / something else?

jdtournier commented 2 months ago

My personal preference is also for all lowercase. Leaning towards index_t, but would be happy enough with index_type. I feel offset & stride both have slightly narrower scopes than what we need here.

daljit46 commented 2 months ago

Another point I would like to add is that adding multiple typedefs for indices, offset (for file offsets) and strides with the same underlying idea is not a bad idea. There are very few downsides and it would make it easier if at some point we want to change the types.

jdtournier commented 2 months ago

Yes, I think that was suggested the last time we discussed it, and I agree there's no real reason to stick with a single typedef. But on the other hand, there may be a few edge cases where it's not clear which one is most appropriate (though it admittedly wouldn't make any difference...). Furthermore, we are very unlikely to ever define these differently, given how often we'll be mixing them up. For example, computing the offset to the current voxel data in an image involves multiplying the strides by the indices and summing them to produce an offset - all 3 types in one expression...

So my preference would be to stick with a single typedef, and try to find one that is sufficiently broad to encompass all 3 uses. Thought I'll freely admit I'm not sure index_type really nails it...

daljit46 commented 1 month ago

Another name that came to my mind was to use isize_t as in Rust where signed integers are named as i32,i64,etc.

Lestropie commented 1 month ago

adding multiple typedefs for indices, offset (for file offsets) and strides with the same underlying idea is not a bad idea.

I'm quite happy to have multiple typedefs that resolve to the same underlying type. I've used that in some places. Often it helps to communicate the intent / appropriate usage of the data at hand.

What it doesn't catch is instances of implicit conversion between them, as there's no corresponding compiler warning for eg. loss of precision. Though perhaps once they are typedef'd we could (optionally temporarily) replace those typedefs with classes with explicit casts?

If something is an "index", I'd expect that to be a ssize_t rather than a signed integer; just an array index but with the scope for catching -1 errors. In the same vein, I suppose that for something like image strides, std::ptrdiff_t is used as it can act as a "difference of indices": idiff_t? Explicitly using std::ptrdiff_t everywhere feels strange because it's typically not pointers on which we are computing differences; this would substitute out just that component.

Other factor here is that STL uses size_t for indexing: https://eigen.tuxfamily.narkive.com/ZhXOa82S/signed-or-unsigned-indexing. There's likely different places in the code that use size_t vs. ssize_t depending on whether they are indexing into an STL container or an Eigen class.

Another name that came to my mind was to use isize_t

I'd be concerned about the subtlety of the distinction between isize_t and ssize_t: too easy to infer from the names alone that ssize_t is "signed" and therefore isize_t must be the opposite of that.