Open yarikoptic opened 6 years ago
Hi @yarikoptic,
Thanks for the feedback; I've been wondering about the same issue at times. I do think most MRtrix3 commands are quite unique, due to the use of a prefix strategy we have in place (but the prefix each time refers to an action, or a "kind" of data, .e.g "tck
" or "dwi
", etc... So the problem may be mostly limited to a few particular ones. I knew (and ran into, at times) the dirsplit
one. blend
indeed seems like another reasonable candidate to clash with other stuff. Are there any other ones you can easily see being troublesome? Here's a complete list:
5tt2gmwmi dwinormalise mrdegibbs tckedit
5tt2vis dwipreproc mrdump tckgen
5ttcheck dwishellmath mredit tckglobal
5ttedit fixel2sh mrfilter tckinfo
5ttgen fixel2tsf mrhistmatch tckmap
afdconnectivity fixel2voxel mrhistogram tckresample
amp2response fixelcfestats mrinfo tcksample
amp2sh fixelconvert mrmath tcksift
average_response fixelcorrespondence mrmetric tcksift2
blend fixelcrop mrpad tckstats
connectome2tck fixelreorient mrregister tcktransform
connectomestats fod2dec mrresize tensor2metric
convert_bruker fod2fixel mrstats transformcalc
dcmedit foreach mrthreshold transformcompose
dcminfo gen_scheme mrtransform transformconvert
dirflip label2colour mrview tsfdivide
dirgen label2mesh mtnormalise tsfinfo
dirmerge labelconvert notfound tsfmult
dirorder labelsgmfix peaks2amp tsfsmooth
dirsplit maskdump population_template tsfthreshold
dirstat maskfilter sh2amp tsfvalidate
dwi2adc mesh2voxel sh2peaks vectorstats
dwi2fod meshconvert sh2power voxel2fixel
dwi2mask meshfilter sh2response voxel2mesh
dwi2response mraverageheader shbasis warp2metric
dwi2tensor mrcalc shconv warpconvert
dwibiascorrect mrcat shview warpcorrect
dwidenoise mrcheckerboardmask tck2connectome warpinit
dwiextract mrclusterstats tck2fixel warpinvert
dwigradcheck mrconvert tckconvert
dwiintensitynorm mrcrop tckdfc
In general, I'd think the dir
prefix may be the most at risk, due to an obvious clash with its usage to refer to a directory in certain contexts. Most of our current main pipelines and peoples script probably don't make heavy use of the dir...
commands, I reckon... so it may be wise for us to reconsider a rename for those... but not sure what that would be (and clearly, it would probably have to be a prefix that itself doesn't start with dir...
, which makes it quite hard to come up with good ideas for).
@MRtrix3/mrtrix3-devs , what do you reckon?
well, if you think about it, even mr
is kinda suboptimal since (Mister) I could envision some project like "Mr.Math" providing "mrmath" or "Mr.Stats" to come about where it would provide a single cmdline "mrmath" which would become in clash with just one of the dozens (may be eventually a hundred or two ;-) ) mrtrix provides. It is a somewhat anecdotal argument, but not 100% a joke ;) that is why I was looking for a "more" unique prefix than mr.
Otherwise foreach
, notfound
, convert_bruker
(might collide with a hypothetical dedicated util for converting from bruker format), warpconvert
, maskdump
, vectorstats
and may be others sound too generic and likely to collide sooner or later.
Also, if just dumped under /usr/bin/ without unique prefix, becomes hard(er) for users to discover "mrtrix" commands from all the available thousands
First off, pretty much all of @thijsdhollander's comments are spot on. We used to have a page detailing our naming conventions, but I can't find it now (I think it was on the old wiki...).
On neurodebian, a while back, I recall some packages being prefixed with their packages (e.g. Fsl5_flirt
or something). Is that no longer how things are done?
Otherwise, yes, I've often thought it might be a good idea to have a 'dispatcher' command like git
does to avoid conflicts like you mention. It's not trivial to retro-fit though, and would certainly be quite a radical change when we're hoping to finalise and release soon...
So there's two options I can think of. One is to ensure users only bring in the packages they need on their session. That's basically the module load
approach used on many clusters. It doesn't guarantee no conflicts, but it helps... It would be relatively easy to provide convenience calls to achieve that - I think you mentioned you already provide that kind of functionality?
The second option is a small proxy script that acts as the gateway, and calls the relevant command. This would be relatively trivial to do, something like this should do the trick:
#!/bin/bash
cmd="$1"
shift
"$cmd" "$@"
Happy with either option, but keen to allow direct access to the commands as they are for those who want it...
I just dug out our old naming conventions page from the wiki - included below. Worth including in the main docs again...?
To make it easier to identify the commands you need, we've put in a bit of thought into the conventions used to name each command. These are based on short mnemonics for the different types of data that a particular command might operate on or produce, and for the types of actions the command might perform (see list below). Command names are then selected in one of two ways:
two data type mnemonics can be tacked together with a '2' to indicate some form of conversion, if the action taken as part of that conversion is obvious (e.g. dwi2fod
)
otherwise, a data type mnemonic can be suffixed with an action, such as view, info, edit, calc, etc. (e.g. tckgen
).
Data type | Description |
---|---|
mr | generic image |
mask | binary mask image |
dwi | 4D image assumed to contain DWI data and associated gradient information (either in header or through external enc or bvecs/bvals files) |
sh | 4D image assumed to contain SH coefficients |
amp | 4D image assumed to contain amplitudes sampled on the sphere |
fod | 4D image assumed to contain SH coefficients for the fibre orientation distribution |
dcm | DICOM data |
tck | track file |
tsf | track scalar file |
peaks | 4D image assumed to contain 3xN peak vectors |
fixel | sparse image assumed to contain 3xN_i fixel vectors |
5tt | 4D image assumed to contain 5 x volume fractions |
label | 3D integer image assumed to contain anatomical labels |
warp | 4D image assumed to contain deformation field (3 volumes) |
dir | a text file containing directions as [ az el ] pairs (one per line) |
Also, I might add that many of the commands that @yarikoptic raises as potentially most problematic are also probably those scripts that have been included mostly as an accident, or provided as a convenience for users. This includes:
blend
: included by @draffelt to help making movies. Shouldn't be a part of MRtrix3 as such, I don't think. But would be useful as an add-on.
foreach
: a script I'd originally cobbled together to run parallel processing, subsequently added to the repo by @draffelt. Again, I don't think it should be included in the main repo: it's a convenience call for general-purpose parallel processing. But again, would be useful as an add-on.
convert_bruker
: another script I'd cobbled together to help load Bruker data. Ideally, Bruker data would be supported by the C++ backend, but there seems to be no simple convention for their format. So this was always provided as a kind of half-hearted attempt at helping our Bruker sites (including my own). It would require a bit of work and dedication to make it work well, for now I reckon it would again best be made available as an add-on.
notfound
: didn't even know about that one... Again, a convenience call for general-purpose batch processing. Not something that should be included in the main MRtrix3 repo, but might be useful as an add-on (although I've never come across anyone using it...?).
So this might be the time to revisit the idea of providing additional non-core commands & scripts as separate standalone gists or repos, and centralise the list somewhere. It's already been raised a few times (though I can't find that conversation anywhere now). That would help keep the main repo clean, containing only functionality that is genuinely MRtrix3-specific and 'officially' supported. What do you reckon? The simplest thing here would be to push them to their own gists or repos, and post a topic in the FAQ category with the label 'add-on' or something equally explicit, which would make it easy to list these extra commands.
Thoughts?
On neurodebian, a while back, I recall done packages being prefixed with their packages (e.g. Fsl5_flirt or something). Is that no longer how things are done?
We still do it, but since it was not adopted by upstream anyhow, I do not think it is used per se, because it would be inferior to breed neurodebian-specific scripts.
module load
I can provide a basic module for mrtrix3
- I already do that for ANTs. The cons is that as far as I see it, module
support is not loaded by default in the shell (at least not in my case with zsh as the default shell).
small proxy script ... It's not trivial to retro-fit though, and would certainly be quite a radical change when we're hoping to finalise and release soon...
well, it could be something to aim for in the longer run. I (and you) could provide such a script within your ultimate non-RC mrtrix release and slowly migrate to use it "more". Ideally, like git/datalad/cmtk et all, that script --help could provide a nice structured listing of the available commands, with the cited above naming convention etc, to orient the users
those scripts that have been included mostly as an accident, or provided as a convenience for users
Thoughts?
Agreed for blend
and convert_bruker
, but I'm strongly in favour of keeping foreach
in though. The latter is a key enabler for many users in running group analyses, most notably the FBA pipeline. I know that this can in principle be scripted or pulled off differently, but foreach
is very convenient and very usable for this task. Putting it elsewhere would only make things more involved, and confusing in particular for those users that benefit of it the most. The notfound
functionality is indeed barely used, but I reckon it could actually be incorporated in foreach
(most of its use is in that context, and even its standalone functionality can work from "within" foreach. I note that @Lestropie has already done some work in #1449 to integrate it better with the other scripts and the manual.
Are there any other ones you can easily see being troublesome?
I know that mrinfo.exe
exists on Windows :sweat_smile: Caught me out a couple of times...
I recall done packages being prefixed with their packages (e.g. Fsl5_flirt or something)
fsl5.0-*
. Our Python scripts that invoke FSL commands have to explicitly check for these, as some systems have binaries / symlinks within PATH
to FSL binaries only with this prefix.
Otherwise, yes, I've often thought it might be a good idea to have a 'dispatcher' command like git does to avoid conflicts like you mention. It's not trivial to retro-fit though, and would certainly be quite a radical change when we're hoping to finalise and release soon...
It could definitely be done, but it would only be a tool provided for admins like yourself and would almost certainly not be the "default" usage of MRtrix3. The brevity of command names, combined with the encapsulation of their expected data / operation within the command name, especially when combined with piping, is simply too powerful; so command name clashes would need to be highly problematic for users to justify moving away from it.
I'm thinking that a 'dispatcher' command might not be too difficult, as long as an environment variable is set that contains the location of MRtrix3 executables only. The dispatcher would simply provide a list of commands (& the synopsis of each) when invoked in isolation, and translate command names to absolute filesystem paths to be executed. Somewhere like /usr/bin/
could still contain symlinks to MRtrix3 commands with/without some identifier prefix.
I just dug out our old naming conventions page from the wiki - included below. Worth including in the main docs again...?
Yeah probably. Given we do still change command names from time to time, such a page would provide explicit justification for such (since command renamings are frequently to improve consistency with the concept).
So this might be the time to revisit the idea of providing additional non-core commands & scripts as separate standalone gists or repos, and centralise the list somewhere.
It will always seem slightly unusual to remove capabilities from the repo; but I do think that anything that's a completely stand-alone file - particularly if it's bash
and hence doesn't appear in the online documentation - can be moved to gists. Though if there's any kind of 'list' I would encourage having a community forum thread where individuals can provide links to gists, rather than a list of "MRtrix3 official gists", since the latter would discourage external participation.
I'm strongly in favour of keeping
foreach
in though.
Me too; it's just too powerful to leave out. If there are specific name clashes for this script then the name could potentially be negotiated.
The
notfound
functionality is indeed barely used, but I reckon it could actually be incorporated inforeach
Indeed I suspect the principal justification for the existence of notfound
is the fact that foreach
runs that miss a subject or two would be hard to diagnose with the current version, whereas with #1449 such issues will be more clearly & explicitly reported. So that script could potentially disappear entirely; notfound
is additionally only applicable to a subset of use cases of foreach
, so relying on foreach
errors is in fact more general.
I am updating mrtrix package to 3.x in Debian (will come up with mrtrix3 to make it possible to install 0.2.x series or this brand new mrtrix3). Nearly all toolkits (FSL, AFNI, ...) in neuroimaging which provide a number of cmdline tools are hindered by proliferation of commands with not unique enough names. E.g.
dirsplit
already would conflict with the one provided by genisoimage. And that is why such commands with as generic name asblend
provided by special toolkits would also not be advisable to be installed system wide. That is why when packaging for (Neuro)Debian we then typically install them under a path not within $PATH and just symlink only a handful of main ones and then demand users to adjust their PATH one way or another . And that is why you have./set_path
helper to tune those PATH variables user-wide, which could also overload the other cmdline tools. Since 3.x is still RC I wondered if you might consider in the long(er) run to either adopt the strategymrtrix calc
(instead or currentmrcalc
).mr
prefix, which already makes it less likely to overlap with some other tools (although there is anmr
tool on its own).mrt
ormrx
would've been even more 'unique' but that would be too much of a changeJust ideas ;) for now I guess I would symlink similarly to before the
mr*
ones