MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
292 stars 180 forks source link

"gateway" cmdline or may be more unique commands prefix? #1445

Open yarikoptic opened 6 years ago

yarikoptic commented 6 years ago

I am updating mrtrix package to 3.x in Debian (will come up with mrtrix3 to make it possible to install 0.2.x series or this brand new mrtrix3). Nearly all toolkits (FSL, AFNI, ...) in neuroimaging which provide a number of cmdline tools are hindered by proliferation of commands with not unique enough names. E.g. dirsplit already would conflict with the one provided by genisoimage. And that is why such commands with as generic name as blend provided by special toolkits would also not be advisable to be installed system wide. That is why when packaging for (Neuro)Debian we then typically install them under a path not within $PATH and just symlink only a handful of main ones and then demand users to adjust their PATH one way or another . And that is why you have ./set_path helper to tune those PATH variables user-wide, which could also overload the other cmdline tools. Since 3.x is still RC I wondered if you might consider in the long(er) run to either adopt the strategy

Just ideas ;) for now I guess I would symlink similarly to before the mr* ones

thijsdhollander commented 6 years ago

Hi @yarikoptic,

Thanks for the feedback; I've been wondering about the same issue at times. I do think most MRtrix3 commands are quite unique, due to the use of a prefix strategy we have in place (but the prefix each time refers to an action, or a "kind" of data, .e.g "tck" or "dwi", etc... So the problem may be mostly limited to a few particular ones. I knew (and ran into, at times) the dirsplit one. blend indeed seems like another reasonable candidate to clash with other stuff. Are there any other ones you can easily see being troublesome? Here's a complete list:

5tt2gmwmi         dwinormalise         mrdegibbs            tckedit
5tt2vis           dwipreproc           mrdump               tckgen
5ttcheck          dwishellmath         mredit               tckglobal
5ttedit           fixel2sh             mrfilter             tckinfo
5ttgen            fixel2tsf            mrhistmatch          tckmap
afdconnectivity   fixel2voxel          mrhistogram          tckresample
amp2response      fixelcfestats        mrinfo               tcksample
amp2sh            fixelconvert         mrmath               tcksift
average_response  fixelcorrespondence  mrmetric             tcksift2
blend             fixelcrop            mrpad                tckstats
connectome2tck    fixelreorient        mrregister           tcktransform
connectomestats   fod2dec              mrresize             tensor2metric
convert_bruker    fod2fixel            mrstats              transformcalc
dcmedit           foreach              mrthreshold          transformcompose
dcminfo           gen_scheme           mrtransform          transformconvert
dirflip           label2colour         mrview               tsfdivide
dirgen            label2mesh           mtnormalise          tsfinfo
dirmerge          labelconvert         notfound             tsfmult
dirorder          labelsgmfix          peaks2amp            tsfsmooth
dirsplit          maskdump             population_template  tsfthreshold
dirstat           maskfilter           sh2amp               tsfvalidate
dwi2adc           mesh2voxel           sh2peaks             vectorstats
dwi2fod           meshconvert          sh2power             voxel2fixel
dwi2mask          meshfilter           sh2response          voxel2mesh
dwi2response      mraverageheader      shbasis              warp2metric
dwi2tensor        mrcalc               shconv               warpconvert
dwibiascorrect    mrcat                shview               warpcorrect
dwidenoise        mrcheckerboardmask   tck2connectome       warpinit
dwiextract        mrclusterstats       tck2fixel            warpinvert
dwigradcheck      mrconvert            tckconvert
dwiintensitynorm  mrcrop               tckdfc

In general, I'd think the dir prefix may be the most at risk, due to an obvious clash with its usage to refer to a directory in certain contexts. Most of our current main pipelines and peoples script probably don't make heavy use of the dir... commands, I reckon... so it may be wise for us to reconsider a rename for those... but not sure what that would be (and clearly, it would probably have to be a prefix that itself doesn't start with dir..., which makes it quite hard to come up with good ideas for).

@MRtrix3/mrtrix3-devs , what do you reckon?

yarikoptic commented 6 years ago

well, if you think about it, even mr is kinda suboptimal since (Mister) I could envision some project like "Mr.Math" providing "mrmath" or "Mr.Stats" to come about where it would provide a single cmdline "mrmath" which would become in clash with just one of the dozens (may be eventually a hundred or two ;-) ) mrtrix provides. It is a somewhat anecdotal argument, but not 100% a joke ;) that is why I was looking for a "more" unique prefix than mr. Otherwise foreach, notfound, convert_bruker (might collide with a hypothetical dedicated util for converting from bruker format), warpconvert, maskdump, vectorstats and may be others sound too generic and likely to collide sooner or later. Also, if just dumped under /usr/bin/ without unique prefix, becomes hard(er) for users to discover "mrtrix" commands from all the available thousands

jdtournier commented 6 years ago

First off, pretty much all of @thijsdhollander's comments are spot on. We used to have a page detailing our naming conventions, but I can't find it now (I think it was on the old wiki...).

On neurodebian, a while back, I recall some packages being prefixed with their packages (e.g. Fsl5_flirt or something). Is that no longer how things are done?

Otherwise, yes, I've often thought it might be a good idea to have a 'dispatcher' command like git does to avoid conflicts like you mention. It's not trivial to retro-fit though, and would certainly be quite a radical change when we're hoping to finalise and release soon...

So there's two options I can think of. One is to ensure users only bring in the packages they need on their session. That's basically the module load approach used on many clusters. It doesn't guarantee no conflicts, but it helps... It would be relatively easy to provide convenience calls to achieve that - I think you mentioned you already provide that kind of functionality?

The second option is a small proxy script that acts as the gateway, and calls the relevant command. This would be relatively trivial to do, something like this should do the trick:

#!/bin/bash

cmd="$1"
shift
"$cmd" "$@"

Happy with either option, but keen to allow direct access to the commands as they are for those who want it...

jdtournier commented 6 years ago

I just dug out our old naming conventions page from the wiki - included below. Worth including in the main docs again...?

Naming conventions

To make it easier to identify the commands you need, we've put in a bit of thought into the conventions used to name each command. These are based on short mnemonics for the different types of data that a particular command might operate on or produce, and for the types of actions the command might perform (see list below). Command names are then selected in one of two ways:

Data type Description
mr generic image
mask binary mask image
dwi 4D image assumed to contain DWI data and associated gradient information (either in header or through external enc or bvecs/bvals files)
sh 4D image assumed to contain SH coefficients
amp 4D image assumed to contain amplitudes sampled on the sphere
fod 4D image assumed to contain SH coefficients for the fibre orientation distribution
dcm DICOM data
tck track file
tsf track scalar file
peaks 4D image assumed to contain 3xN peak vectors
fixel sparse image assumed to contain 3xN_i fixel vectors
5tt 4D image assumed to contain 5 x volume fractions
label 3D integer image assumed to contain anatomical labels
warp 4D image assumed to contain deformation field (3 volumes)
dir a text file containing directions as [ az el ] pairs (one per line)
jdtournier commented 6 years ago

Also, I might add that many of the commands that @yarikoptic raises as potentially most problematic are also probably those scripts that have been included mostly as an accident, or provided as a convenience for users. This includes:

So this might be the time to revisit the idea of providing additional non-core commands & scripts as separate standalone gists or repos, and centralise the list somewhere. It's already been raised a few times (though I can't find that conversation anywhere now). That would help keep the main repo clean, containing only functionality that is genuinely MRtrix3-specific and 'officially' supported. What do you reckon? The simplest thing here would be to push them to their own gists or repos, and post a topic in the FAQ category with the label 'add-on' or something equally explicit, which would make it easy to list these extra commands.

Thoughts?

yarikoptic commented 6 years ago

On neurodebian, a while back, I recall done packages being prefixed with their packages (e.g. Fsl5_flirt or something). Is that no longer how things are done?

We still do it, but since it was not adopted by upstream anyhow, I do not think it is used per se, because it would be inferior to breed neurodebian-specific scripts.

module load

I can provide a basic module for mrtrix3 - I already do that for ANTs. The cons is that as far as I see it, module support is not loaded by default in the shell (at least not in my case with zsh as the default shell).

small proxy script ... It's not trivial to retro-fit though, and would certainly be quite a radical change when we're hoping to finalise and release soon...

well, it could be something to aim for in the longer run. I (and you) could provide such a script within your ultimate non-RC mrtrix release and slowly migrate to use it "more". Ideally, like git/datalad/cmtk et all, that script --help could provide a nice structured listing of the available commands, with the cited above naming convention etc, to orient the users

thijsdhollander commented 6 years ago

those scripts that have been included mostly as an accident, or provided as a convenience for users

Thoughts?

Agreed for blend and convert_bruker, but I'm strongly in favour of keeping foreach in though. The latter is a key enabler for many users in running group analyses, most notably the FBA pipeline. I know that this can in principle be scripted or pulled off differently, but foreach is very convenient and very usable for this task. Putting it elsewhere would only make things more involved, and confusing in particular for those users that benefit of it the most. The notfound functionality is indeed barely used, but I reckon it could actually be incorporated in foreach (most of its use is in that context, and even its standalone functionality can work from "within" foreach. I note that @Lestropie has already done some work in #1449 to integrate it better with the other scripts and the manual.

Lestropie commented 6 years ago

Are there any other ones you can easily see being troublesome?

I know that mrinfo.exe exists on Windows :sweat_smile: Caught me out a couple of times...

I recall done packages being prefixed with their packages (e.g. Fsl5_flirt or something)

fsl5.0-*. Our Python scripts that invoke FSL commands have to explicitly check for these, as some systems have binaries / symlinks within PATH to FSL binaries only with this prefix.

Otherwise, yes, I've often thought it might be a good idea to have a 'dispatcher' command like git does to avoid conflicts like you mention. It's not trivial to retro-fit though, and would certainly be quite a radical change when we're hoping to finalise and release soon...

It could definitely be done, but it would only be a tool provided for admins like yourself and would almost certainly not be the "default" usage of MRtrix3. The brevity of command names, combined with the encapsulation of their expected data / operation within the command name, especially when combined with piping, is simply too powerful; so command name clashes would need to be highly problematic for users to justify moving away from it.

I'm thinking that a 'dispatcher' command might not be too difficult, as long as an environment variable is set that contains the location of MRtrix3 executables only. The dispatcher would simply provide a list of commands (& the synopsis of each) when invoked in isolation, and translate command names to absolute filesystem paths to be executed. Somewhere like /usr/bin/ could still contain symlinks to MRtrix3 commands with/without some identifier prefix.

I just dug out our old naming conventions page from the wiki - included below. Worth including in the main docs again...?

Yeah probably. Given we do still change command names from time to time, such a page would provide explicit justification for such (since command renamings are frequently to improve consistency with the concept).

So this might be the time to revisit the idea of providing additional non-core commands & scripts as separate standalone gists or repos, and centralise the list somewhere.

It will always seem slightly unusual to remove capabilities from the repo; but I do think that anything that's a completely stand-alone file - particularly if it's bash and hence doesn't appear in the online documentation - can be moved to gists. Though if there's any kind of 'list' I would encourage having a community forum thread where individuals can provide links to gists, rather than a list of "MRtrix3 official gists", since the latter would discourage external participation.

I'm strongly in favour of keeping foreach in though.

Me too; it's just too powerful to leave out. If there are specific name clashes for this script then the name could potentially be negotiated.

The notfound functionality is indeed barely used, but I reckon it could actually be incorporated in foreach

Indeed I suspect the principal justification for the existence of notfound is the fact that foreach runs that miss a subject or two would be hard to diagnose with the current version, whereas with #1449 such issues will be more clearly & explicitly reported. So that script could potentially disappear entirely; notfound is additionally only applicable to a subset of use cases of foreach, so relying on foreach errors is in fact more general.