MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
291 stars 179 forks source link

mrtrix3 needs a real build system #864

Closed gdevenyi closed 7 years ago

gdevenyi commented 7 years ago

As of right now, packaging, or even installing (with things like --prefix) isn't possible.

This makes supporting on shared systems such as HPC exceptionally hard.

gdevenyi commented 7 years ago

One possible way to provide PATHs without modifying a user's environment: https://github.com/BIC-MNI/minc-toolkit-v2/blob/master/CMakeLists.txt#L554-L565

Templates of sourceable bash and tcsh export scripts.

jdtournier commented 7 years ago

As of right now, packaging, or even installing (with things like --prefix) isn't possible.

Well, maybe it's not documented, but it's certainly possible. That said, the main reason we don't currently provide overt shortcuts for this is that development currently proceeds at such a pace that I would generally recommend that individual users maintain their own version, so that users can update without fear of conflict. But I appreciate this can place undue burden on users. Also, we are still officially in beta, once we hit a proper release cycle, a lot of these issues will be addressed.

There is however a script to automate some of this, called package_mrtrix, but it's probably a bit outdated at the moment, although I think @maxpietsch has been bringing it up to date recently.

This makes supporting on shared systems such as HPC exceptionally hard.

I have to admit, installation on HPC can be very difficult, but not in my experience because of the issues you raise here - the problem has always been to get up to date versions of the dependencies, a compiler with proper C++11 support, and Python version >=2.7. The other issues that's very problematic is that of remote display (see this page in the documentation about this). Personally, I think packaging itself is relatively easy in comparison...

So, for packaging, one thing to bear in mind is that the current configure script will compile with -march=native, which will most likely cause issues if building on a different system from the deployment target. We use this by default to ensure optimal performance on the current system. That's not a problem if you're building on the HPC itself, and the HPC does not contain a mix of CPUs (we recently had issues with a cluster containing a mix of old & new nodes). However, if you need to specify a given minimum CPU architecture, you can pick the most appropriate from the list in the GCC docs and pass that through to configure via the ARCH environment variable. So this might look like:

ARCH=`sandybridge` ./configure

Once this is done, ./build will compile everything and place everything in the release folder. You can happily tar this up (all you need is the contents of bin/ and the lib/libmrtrix*.so library) and extract where you need it, and then set the PATH for users that want access to it. You could set the PATH system-wide if you need to, but generally I find it cleaner to avoid conflicts and let users add that location to their PATH if they plan on using MRtrix - but that is obviously up to how you manage your installation. The executables and library are designed to be relocatable by the way, as long as the libmrtrix.so file remains in the ../lib folder relative to the executables, everything should work wherever you copy the folder to.

You will also most likely want the scripts too. The simplest way to make sure they get included is to copy them across to the release/bin folder before packaging things up.

So if you put it all together, these commands should allow you to do what you need:

ARCH=`sandybridge` ./configure
./build
cp -r scripts/* release/bin
cd release/
tar cvfj mrtrix3.tar.bz2 bin scripts lib/libmrtrix-*.so

At which point you can pick up the release/mrtrix3.tar.bz2 file and extract it where you want, e.g.:

mkdir /opt/mrtrix3
cd /opt/mrtrix3
tar vxfj mrtrix3.tar.bz2

And get users to add:

export PATH=/opt/mrtrix3/bin:$PATH 

to their shell startup file (or add it to your system-wide version).

So realistically, packaging only requires an additional 3 lines beyond the usual ./build, and installation should only be a matter of extracting that to a suitable location. I appreciate we could make it a touch easier, and I've no doubt we will at some point soon - but I don't think it's too complicated as things stand (although documentation would probably help quite a bit in that regard...).

One possible way to provide PATHs without modifying a user's environment: https://github.com/BIC-MNI/minc-toolkit-v2/blob/master/CMakeLists.txt#L554-L565

Templates of sourceable bash and tcsh export scripts.

So I'm not sure what you're referring to here. Is it simply a means of setting the PATH...? There is a set_path script that is supposed to do this for a user-local install, but really, the only configuration required for MRtrix3 is setting the PATH as above. Even if we were to provide a sourceable script, it would still require adding the line sourcing this script to be placed in the startup script, so I'm not sure it would necessarily help all that much. Unless I'm missing the point...?

maxpietsch commented 7 years ago

package_mrtrix should work now, let us know if not.

Copying the scripts into the bin folder, however, will not work as they expect to find the mrtrix binaries in ../release/bin (which could be changed in lib/runCommand.py and lib/getHeaderInfo.py).

If you need the scripts to be in the same folder as the binaries, you could symlink them there, as I do in the homembrew formula.

Cheers, Max

jdtournier commented 7 years ago

Copying the scripts into the bin folder, however, will not work as they expect to find the mrtrix binaries in ../release/bin (which could be changed in lib/runCommand.py and lib/getHeaderInfo.py).

Ah, yes, I forgot about that... We might need to find a more appropriate way to deal with this, ideally everything would be self-contained in the release folder, so it can be renamed and moved without affecting proper operation. We may need to think about this a bit more...

gdevenyi commented 7 years ago

I'm at a bit of a loss as to why the scripts need to know where the binaries are, as long as they're in the PATH, that should be all that's needed.

Anywho, the ANTs guys handle this by defining an ANTSPATH environment variable to point at the install.

jdtournier commented 7 years ago

Yes, the FSL also uses a similar strategy. The problem we're trying to address here is side by side installs with different versions. We need to make sure the scripts match the binaries, sometimes incompatibilities get introduced. It's caught us out a few times... With the current approach, as long as the folder structure is maintained, we can guarantee that the scripts will use the matching executables.

Lestropie commented 7 years ago

I'm at a bit of a loss as to why the scripts need to know where the binaries are, as long as they're in the PATH, that should be all that's needed.

This is for developers who have multiple MRtrix3 installs in multiple locations. If you run a particular script (or specific version of a script) from within a particular installation, it may be dependent on a specific version of a binary that resides within that particular installation; but relying on PATH will result in the binary within the developer's 'default' MRtrix3 installation being used, which may not be compatible with the script being run.

(ack; beat me to it)

Having said that, the scripts/ directory has been built on the premise of a code repository-based installation, without any consideration thus far of how they might be provided as part of a pre-built release. I've been meaning to create an issue for this for a while now.

gdevenyi commented 7 years ago

Okay, I'm getting the "developer centric" design of this, however I'm trying to install this in an HPC-like modules quarantine system so I'm trying to wrap my head around packaging this into that system.

I maintain versions based on prefixes and environment-modules ( http://modules.sourceforge.net/ ).

Is mrtrix3 stable enough for me to be even providing versions for my users or is the code churn too high right now?

jdtournier commented 7 years ago

I maintain versions based on prefixes and environment-modules ( http://modules.sourceforge.net/ ).

Yes, that shouldn't be a problem, the only environment variable that needs to be set is the PATH. You will need to ignore (or at least modify) my previous suggestions though: the folder structure to copy over as thing stand consist of:

I'd also follow @maxpietsch's advice and symlink the scripts from the release/bin folder, means you only need to worry about setting one PATH.

Is mrtrix3 stable enough for me to be even providing versions for my users or is the code churn too high right now?

It's stable enough in terms of runtime operation, but the code churn is indeed very high. In any case, it's best to stick with tagged versions if you need to maintain some kind of stability. We try to make sure the master branch is always stable, and only merge changes in there when we feel they're ready. But it's probably best to install actual tagged versions if you need to keep things manageable.

jdtournier commented 7 years ago

So while we're on the topic: for the previous version of MRtrix 0.2, I had provided a ./build install target that would take care of things like this. It might be time to implement this again - it's a trivial addition, and does remove the need for users and sysadmins to wrap their heads around our strange ways...

However, to do this right, we're going to have to sort out the location of the scripts. I think it's still important that we ensure the matching executables are invoked, to avoid mix-ups when directly invoking a specific version of the script that isn't in the PATH, or just not listed first in the PATH. For instance, users may find that version X.Z of our dostuff script doesn't work, when the earlier version X.Y worked fine. So they might be tempted to issue something like:

/usr/local/mrtrix3-X.Y/scripts/dostuff ... args ...

and find that things fall over in yet more weird and wonderful ways... Not likely to happen a lot, I admit, but this is definitely something that I'd happily try occasionally. Since we have the infrastructure already in place to deal with this, I think we should keep it, but just adjust the locations of these scripts to make things a little bit easier to manage.

So the simplest fix, in my opinion, is to modify this line to simply look in the same folder as the script itself. We should also disable the os.path.realpath() symbolic link resolution strategy (we'd added that to allow links to the scripts to be properly resolved, so that we could then look in the ../release/bin folder relative to the actual scripts...). Finally, we should have the build script copy the full contents of the scripts/ folder over to the target bin folder - remember, it's release/bin by default, but we might have different configs installed. So if you want to run a script with a debug/assert build, which you might have set up with a ./configure -assert -debug mydebug to reside in the mydebug folder, there's currently no way to do this - you have to re-configure and re-build the main release folder itself.

This would then simplify the process of installation, and also only require a single entry in the PATH. I think we could adopt the strategy used in the previous version, which was:

This kept things cleanly separated, but by default directly usable with no further configuration required. Sysadmins could choose where to locate the main install, and also elect to symlink from somewhere else (maybe /usr/local/bin) or not at all if they were going to set user's PATHs directly.

This also allowed for relatively clean uninstalls: there was also an uninstall target in the build script to facilitate this, but maybe it should be provided as a proper script in the scripts/ folder and called mrtrix3_uninstall or something, which would guarantee it was available even if the original build folder was subsequently deleted. The way that worked was:

Another similar (and probably better / cleaner) strategy might be to locate mrinfo (or better, mrconvert - I think there is an mrinfo command on Windows systems, might clash), figure out its true location if a symlink, then go through the symlink's enclosing folder and identify any other symlinks pointing to a target within the same folder as the first symlink. As a quick bash script, this would look like:

cmd=$(which mrconvert)
# check if symbolic link:
if [ -h $cmd ]; then  
  cmdfolder=$(dirname $cmd)
  target=$(readlink $cmd)
  targetfolder=$(dirname $target)
  for entry in $(ls $cmdfolder); do
    if [ -h $entry ]; then
      entrytarget=$(readlink $entry)
      if [ $(dirname $entrytarget) == $targetfolder ]; then
        rm -f $entry
      fi
    fi
  done
fi

Could be cleaned up quite a bit in python, no doubt, but you get the gist...

Thoughts...?

Lestropie commented 7 years ago

With regards to the scripts: I was planning on looking into something like pyinstaller: Compile each compatible script in the scripts/ directory into its own single file. That way they can reside wherever (most likely the same directory as the binaries so only one PATH entry is needed for installation), no dangling src/ / lib/ directories, and should still be possible to test & override paths of MRtrix binaries for version compatibility. Haven't tried yet though.

Lestropie commented 7 years ago

So the simplest fix, in my opinion, is to modify this line to simply look in the same folder as the script itself. We should also disable the os.path.realpath() symbolic link resolution strategy (we'd added that to allow links to the scripts to be properly resolved, so that we could then look in the ../release/bin folder relative to the actual scripts...). Finally, we should have the build script copy the full contents of the scripts/ folder over to the target bin folder - remember, it's release/bin by default, but we might have different configs installed. So if you want to run a script with a debug/assert build, which you might have set up with a ./configure -assert -debug mydebug to reside in the mydebug folder, there's currently no way to do this - you have to re-configure and re-build the main release folder itself.

This works for packaging, but makes it harder for devs. Modifications made in the scripts/ directory won't be reflected in the scripts being executed, and modifications to the script copies in e.g. release/bin won't be detected by Git. Personally I'd rather keep the code repository design development-centric (i.e. as it is), deal with having the scripts call different MRtrix3 binary configurations using a standard command-line option, and deal with how to package the scripts only when packaging MRtrix3 as a whole.

Even if the packaging process were to copy the contents of scripts/ into e.g. /opt/mrtrix/bin, you could still have runCommand() check both its own real location and ../release/bin for the MRtrix3 binaries, and that way the same Python code would happily work when the files are arranged in both 'repository' and 'package' configurations. Though personally even when packaging I'd still have /opt/mrtrix/bin and /opt/mrtrix/scripts: If a user / manager has to add a location to PATH, adding two locations probably isn't beyond them; alternatively, if you're adding symlinks to e.g. /opt/mrtrix/bin/* in e.g. /usr/bin, just add symlinks to /opt/mrtrix/scripts/* as well.

jdtournier commented 7 years ago

I agree with all of the above, there's many ways to make this work. My only concern here is to make sure our approach conforms with typically accepted conventions as much as possible to avoid surprises and frustration from users. So that means that anything executable should reside in a bin/ folder (including the scripts), and the dynamic library should be in the lib/ folder next to it. I don't have a problem with this being resolved at package time though, so we don't necessarily need to change the layout we currently have - I agree anything else is going to be unwieldy.

I was going to suggest symlinking the scripts folder into release/, which would have dealt with this issue (essentially leave the scripts in place, create a symlink to the scripts/ folder within the release/ folder, and symlink all the scripts in that location from bin/). But that remains pretty messy, and won't work on Windows given that it doesn't support symbolic links.

Even if the packaging process were to copy the contents of scripts/ into e.g. /opt/mrtrix/bin, you could still have runCommand() check both its own real location and ../release/bin for the MRtrix3 binaries, and that way the same Python code would happily work when the files are arranged in both 'repository' and 'package' configurations.

That may be the cleanest solution. In general, if we want this to work on Windows, we need to allow for a symlink-free setup. If we can dump everything within the same folder (binaries, DLL and scripts), then the Windows install is straightforward - as it would be on all other platforms. It would for instance obviate the need for @maxpietsch's symlink resolution strategy to get things working in HomeBrew.

However, having just had a look into the scripts/ folder, there's a few things that might prove problematic, or at least unexpected. The first one is the data/ folder: this contains text files that would probably be better suited to living in share/mrtrix3, if we were to stick with the Filesystem Hierarchy Standard (FHS). The second is the existence of both a lib/ and a src/ folder. I get the subtle distinction between them, but really there is nothing to stop them from being merged into a single lib/ folder - in many ways, that is much more in line with what I would expect: I'd expect the src/ folder to contain human-readable source code that is unusable without compilation (as per the FHS).

Anyway, none of this really ticks all the boxes. I'm not sure what the best way to proceed is going to be. Maybe the whole multi-config build setup is too bulky and gets in the way here - things would be a bit easier if we were to build directly into single top-level bin/ and lib/ folders, rather than placing them in the release/ folder. I don't know how many of you devs ever even use that feature...? I'm going to think about this a bit, maybe there's a simple solution to this that will keep everything simple...

Lestropie commented 7 years ago

However, having just had a look into the scripts/ folder, there's a few things that might prove problematic, or at least unexpected.

The first one is the data/ folder: this contains text files that would probably be better suited to living in share/mrtrix3, if we were to stick with the Filesystem Hierarchy Standard (FHS).

The contents in here aren't ever actually 'shared'; it's simply text data that is used by e.g. labelconvert in one of the scripts, and therefore they were added as separate text files both for separation's sake and because it needs to be in a stand-alone text file for the sake of the invoked binary. And I figured having a data/ directory would be more generic for the sake of any future additions. But if this proves to be problematic, a better solution might be to encapsulate these data within the relevant script, and write that text to a file in the temporary directory as part of the script execution.

The second is the existence of both a lib/ and a src/ folder. I get the subtle distinction between them, but really there is nothing to stop them from being merged into a single lib/ folder ...

I'm not sure it's all that subtle: The point of that design is that it's possible to design your own specific algorithm for performing a particular process (e.g. response function estimation), add it to the src/ directory of the relevant master script, and that algorithm becomes available for calling from within that master script and appears within the help page without any modification to MRtrix3 files. The contents of lib/ are usable by any and all scripts.

... - in many ways, that is much more in line with what I would expect: I'd expect the src/ folder to contain human-readable source code that is unusable without compilation (as per the FHS).

The code in scripts/src/ is unusable 'without compilation' (well, stand-alone at least): It only works if the master script provides the requisite files in the temporary directory, and calls the relevant functions within that file at the appropriate time. These individual scripts/src/ files don't execute stand-alone.

Maybe the whole multi-config build setup is too bulky and gets in the way here - things would be a bit easier if we were to build directly into single top-level bin/ and lib/ folders, rather than placing them in the release/ folder. I don't know how many of you devs ever even use that feature...?

I use it constantly. But like I said, the issue of calling binaries corresponding to a particular config when invoked from the scripts could be solved using a standard command-line option, and still properly handle MRtrix version matching. Packaging will only ever deal with release. I'd rather focus on finding a solution for packaging that doesn't require bastardising what we already have: Even once pre-compiled packages become available, I'd still endorse native compilation.

So that means that anything executable should reside in a bin/ folder (including the scripts), and the dynamic library should be in the lib/ folder next to it.

I'm not against scripts going into bin/ as part of the packaging process. But it would seem highly unusual to me to have further data/ / lib/ / src/ sub-directories within the bin/ directory providing un-compiled source code for Python scripts. Hence why I wanted to look into possibilities for compiling those scripts; preferably into a single executable file per script. Then having them sit in bin/ makes a lot of sense.

jdtournier commented 7 years ago

OK, I totally get the rationale behind the decisions that were made, and they're all very sensible. The problem is that the Unix community has come up with a set of standards around these things, which have been codified into the Filesystem Hierarchy Standard (FHS). And while I agree that some of the naming used doesn't always match the implied purpose, it would nonetheless be sensible to abide by these conventions if possible - it's likely to raise the fewest eyebrows. I'll go through some of the specifics at the end of my comment.

And I also think native compilation is the better solution for users and developers alike, but I'd like to find some solution that allows us to maintain a structure that is fully compatible with both an in-place native-compiled install and a packaged install, in that the packaged install would just be the native file structure with all of the unnecessary bits stripped out. I'm not saying I've figured out a good solution yet though...

Also great to hear you use the side-by-side configs feature - it's reassuring to hear that wasn't a wasted effort... :wink:


The following is just to address some of your specific comments about the file structure. Note that I'm not necessarily saying we need to change things now, I don't think we've nutted this out properly yet. I do want to keep things simple for our current 'native' installs, and preserve the multi-config aspects. Also, like I said, there's nothing wrong with the current logic of how things are organised, it's just not a great match to established conventions. This is actually trivial to fix, it only implies moving things around in different folders, and changes nothing about the way things work (and also solves none of the real problems we're going to face with multi-config and equivalent native/packaged installs).

The first one is the data/ folder: this contains text files that would probably be better suited to living in share/mrtrix3, if we were to stick with the Filesystem Hierarchy Standard (FHS).

The contents in here aren't ever actually 'shared'; it's simply text data that is used by e.g. labelconvert in one of the scripts, and therefore they were added as separate text files both for separation's sake and because it needs to be in a stand-alone text file for the sake of the invoked binary.

Yep, that's pretty much exactly how the share folder is used by most other packages: it's just data that isn't executable (whether directly or as a library), and needs to reside in a known location for use by the corresponding software. Most of that is never actually shared between packages, and a lot of that is probably only ever used by a single executable. In fact, if you look through your own /usr/share/ folder, you'll probably find a bunch of subfolders in there, one per package (I've got a /usr/share/git/, a /usr/share/gdb/, a /usr/share/emacs/, etc). There's no real expectation that whatever is in these folders will ever be 'shared' as such...

So the only thing is that to keep things clean and separate, we'd need to place the contents of this src/data/ folder within a share/mrtrix3/ folder - just to avoid potential conflicts with other packages that may eventually be installed alongside in the same 'share' folder.

But if this proves to be problematic, a better solution might be to encapsulate these data within the relevant script, and write that text to a file in the temporary directory as part of the script execution.

No, it's not problematic at all, and that would be a needless hack. The only question is about what that location should be.

The second is the existence of both a lib/ and a src/ folder. I get the subtle distinction between them, but really there is nothing to stop them from being merged into a single lib/ folder ...

I'm not sure it's all that subtle: The point of that design is that it's possible to design your own specific algorithm for performing a particular process (e.g. response function estimation), add it to the src/ directory of the relevant master script, and that algorithm becomes available for calling from within that master script and appears within the help page without any modification to MRtrix3 files. The contents of lib/ are usable by any and all scripts.

I get the point, but again, that's exactly how the lib folder is used in practice. If you look through your /usr/lib/ folder, you'll again find loads of package-specific folders, some of them containing things that genuinely class as libraries (whether it be binary libraries or script modules, e.g. within the python folders), and others things that are genuine plugins like what you've got in scripts/src/. The only difference between your plugins and 'normal' libraries is that the plugins are not explicitly designed to be shared, and are discoverable at runtime. But lib/ is still the most appropriate place for them. For example, Qt stores its plugins in /usr/lib/qt/plugins/, and the GIMP stores its plugins in /usr/lib/gimp/2.0/plug-ins/. Mind you, it also stores a bunch of scripted plugins in /usr/share/gimp/2.0/scripts/, so things are not that clear cut. You could make the argument that the lib/ should contain binaries, and share/ anything else, but python certainly doesn't do that (it's all .py stuff in /usr/lib/python3.5/... So all in all, I think the most sensible place that is in accordance with the FHS is for the contents of scripts/lib/ and scripts/src/ to be merged and placed in lib/mrtrix3/. The stuff in scripts/src/ is separated out into distinct folders named after their corresponding master script anyway, I don't see that this would make things any messier, really.

  • in many ways, that is much more in line with what I would expect: I'd expect the src/ folder to contain human-readable source code that is unusable without compilation (as per the FHS).

The code in scripts/src/ is unusable 'without compilation' (well, stand-alone at least): It only works if the master script provides the requisite files in the temporary directory, and calls the relevant functions within that file at the appropriate time. These individual scripts/src/ files don't execute stand-alone.

OK, I expressed myself badly. On my system, the only thing in /usr/src/ is the source C code for the VirtualBox kernel module, placed there so that DKMS can rebuild it after a kernel update. The point is, it's not somewhere you'd place files that are supposed to be runtime-usable. You could also say that none of the libraries in lib/ execute stand-alone, but they are executable nonetheless (literally for binary .so files - they genuinely do need to have execute permissions to be usable, the kernel won't allow the code in them to run at all if they're not). Even scripts (including your plugins) contain code that is designed to be directly executed - they don't need to be pre-compiled to make them usable.

Anyway, like I said, none of this is all that critical, and renaming a bunch of folders isn't going to solve the bigger issues we're trying to address. I'll think about this some more and try to come up with a plan, we can discuss whether that would work for everyone at that point...

Lestropie commented 7 years ago

Just updating in case your head's churning on this: I'm now confident that I can get the Python scripts compiled into a single executable file each (even 5ttgen and dwi2response), without influencing the current repo arrangement. This includes packaging the relevant contents of scripts/data/ into the script file requiring it. Only condition is that package_mrtrix would preferably be in Python to allow me to do the requisite gymnastics.

jdtournier commented 7 years ago

Just updating in case your head's churning on this: I'm now confident that I can get the Python scripts compiled into a single executable file each (even 5ttgen and dwi2response), without influencing the current repo arrangement. This includes packaging the relevant contents of scripts/data/ into the script file requiring it. Only condition is that package_mrtrix would preferably be in Python to allow me to do the requisite gymnastics.

Just had a few more thoughts on this. I think there's a lot deeper issues with the folder hierarchy that would need to be sorted out if we are to fix everything properly. I'm not against shipping these precompiled python executables, but I am slightly reticent to introduce more dependencies if they're not strictly necessary. I don't think this is currently a problem at all, we have no trouble locating the data based on where the scripts are, we just need to figure out the best place to store both the scripts and the data. But as a solution for packaging up the scripts when we get around to supplying pre-compiled binaries, this is definitely worth considering.

Anyway, I've been thinking about a potential way to address all of this over the last few weeks, and I think I have a solution that might tick all the boxes. But it would require quite a few changes... I'll post my ideas in a comment below, if only to record my thoughts for future reference - I anticipate there will be some reticence to changing things around yet again...

Lestropie commented 7 years ago

I'm not against shipping these precompiled python executables, but I am slightly reticent to introduce more dependencies if they're not strictly necessary.

Only requirement is that you have to install PyInstaller if you want to perform the packaging process - and that's a one-liner through pip. No additional dependency for running the compiled script.

I don't think this is currently a problem at all, we have no trouble locating the data based on where the scripts are, we just need to figure out the best place to store both the scripts and the data. But as a solution for packaging up the scripts when we get around to supplying pre-compiled binaries, this is definitely worth considering.

Just to make sure it got through: The additional data required by certain scripts gets packaged into the relevant compiled script. So for a packaged MRtrix install, there won't be "data required for scripts": there's just one file for each script. The only requirement in terms of directory structure is knowing where the MRtrix binaries are relative to the script, in order to guarantee that MRtrix binaries from the corresponding MRtrix version are invoked; this relative path can be different between the repository and the installed locations (and given each script will be a single file in the packaged version it makes sense to me to just assume they'll all be in the same directory).

See branch package_scripts to see how far I got; in particular the file package_scripts and scripts/lib/package.py.

If you're talking about changing the directory structure of the scripts/ directory within the repository, that's a separate discussion to packaging.

jdtournier commented 7 years ago

OK, as I mentioned above, I've had a think and a play around with this, I think I've got a solution that ticks all of the boxes, but it will require quite a bit of rejigging. Most of it is moving files about, some of it will require deeper modifications. I'm just going to document things here, in case we eventually want to move forward with this, or something similar in the future. Maybe after tag 0.3.16 is merged, or maybe as part of that merge if people think it's worth going for straight away.

There's quite a few changes, so I'll try to go through them and their rationale in some semblance of order, please grab yourselves a brew and bear with me...


multi-build support with seamless native/packaged installation

The idea here is to maintain full compatibility between packaged and native installs - what I mean by that is that the packaged install is a stripped-down version of the full native install, with all the remaining files otherwise staying in the same exact place. There are a few challenges here:

So what I'm proposing is to have a system where configs can be swapped out, with one of them designated as active, and the others stashed elsewhere for later use. More specifically, what I have in mind is a folder structure like this:

.
├── bin/                 # executables (and invokable scripts)
├── release.build        # contains a non-active config and all its associated files
├── debug.build.active   # this is the currently active config
├── assert.build         # another stashed config
├── cmd/                 # same as currently.
├── config               # the current config file.
├── lib/                 # the shared library would be in here.
│   └── mrtrix3/         # python modules and other dependencies would be in here.
├── core/                # what is currently lib/ - see below for details.
├── share/           
│   └── mrtrix3/         # data files required by scripts, etc. would be in here
├── src/                 # as currently (although might rename to e.g. ext/ - see later)
└── tmp/                 # used to dump any temporaries (object files, etc). 

In detail:

So this would allow multiple builds, which the user could easily switch between. Since the new build would be directly in the PATH, this would allow the scripts to use the different configs directly. And with a bit more work, we can have a folder hierarchy that will match exactly what the packaged install would require - more on that in the next section.


FHS-compliant folder structure

So this one is a simple matter of moving and renaming a few things, with a few added minor complications that can be easily dealt with. So:

Since the lib/ folder would now be used for different purposes than currently, we'd need to rename the current lib/ folder to something else. I propose to rename it to core/, but we may decide something else here. I'd also potentially rename src/ to ext/ (or similar), since really the only reason it's not in the lib/ folder currently is because that code won't be included in the main MRtrix3 shared library. I don't think src/ currently really convey that sense - but that's a minor point.

The folders listed above would (I think) make up everything included in a packaged install. And as I mentioned in a previous post, I think this structure would be much more in line with expectations and the FHS conventions. It certainly matches my experience of how packages are organised on Linux. The point is these could be dumped/merged into /usr/ or /usr/local/ and work out of the box with minimal chance of conflict, and integrate fairly seamlessly with any other packages installed in those locations. Of course, there's no actual need for them to be placed in any particular location, the only configuration required would be to set the PATH accordingly - whether the code remains in place or is packaged up and installed somewhere else.


Getting the scripts to find the required modules and executables

If the Python modules that used to be in scripts/lib now reside in lib/mrtrix3/, while the scripts themselves are in bin/, we need to have a system for the scripts to locate these modules. Thankfully, this seems to be pretty simple: we can add this line after the import sys, os line, and before importing anything MRtrix-specific:

sys.path.append (os.path.abspath (os.path.join (os.path.dirname (os.path.realpath(__file__)), os.pardir, 'lib')));

at which point this will work:

from mrtrix3.runCommand import runCommand

Note that this implies that the MRtrix3 modules are now found in mrtrix3 rather than lib, which I think is probably more sensible long-term, lib is too generic a name, it's better to clarify where the code is coming from. Note that this is simply matching the name of the corresponding folder in lib/, where the code will be located - this would be different if we chose a different name for that folder.

By the way, I gave this a shot for dwi2response, and it seemed to work just fine. I had to change all occurences of lib with mrtrix3, add the sys.path.append() call to the main dwi2response script, modify the line in getAlgorithmList.py that dealt with locating the various algorithms, and it then seemed to work as expected. I expect the other scripts would be equally trivial to modify.

The final change would be to look for the executables in the same folder as the script, but that's relatively trivial. Also, we'd need to modify the location where the scripts look for any data files, but again I don't expect any issues there.


Switching between configs

We'd need to add facilities to manage the configs, particularly:

We could have a little script called ./select (or something more explicit if you want), which could be used something like this:

$ ./select
active config is 'release'
$ ./select debug
storing current config 'release'
creating and activating new config 'debug'
$ ./configure -debug -assert && ./build
...
$ ./select release
storing current config 'debug'
activating config 'release'

The simplest way to store configs is to create an archive (potentially compressed) of the relevant files. That way they can reside in the toplevel folder, giving the user direct visibility into what configs are available and which one is active. A command like this seems to do the trick:

# to stash the data for a config:
tar cf assert.build --remove-files bin/* tmp/* lib/libmrtrix-*.so config

# to unstash it:
tar xf assert.build

We could also compress the data, simply by adding the z flag, but it'll probably be too slow (takes ~10s on my system to tar xzf a fully-built release folder, but only 0.8s without compression). We could then label the active config by adding .active to the filename or something like that. Note we'd need to modify the above to avoid stashing scripts from the bin/ folder. Anyway, the point is there's ways to do this relatively easily.

One thing that might be problematic is dealing with external MRrtix3 modules that link to the main install. Currently, the whole config system works transparently with that, but things might get annoying if we change the config on the main install, in that this will either trigger a full rebuild of the module, or not trigger any rebuild at all, depending on the timestamps of the main install relative to the module. I'm not sure how to deal with this. This may require some runtime magic within build to detect changes in the config, and maybe run select within the module itself if necessary. I guess we can deal with that if/when the time comes...


Anyway, food for thought. Nothing in there is particularly difficult to do, but it will change quite a few things, particularly the installation instructions, the docs in various places, the test scripts, etc. So it's not a minor undertaking. But if we do want to change the structure to something that is more Unix-like, then I think this is one option. The other option is of course to leave everything as-is, and supply a script to manage the install proper for those who don't want to run within their own folder.

jdtournier commented 7 years ago

Just to make sure it got through: The additional data required by certain scripts gets packaged into the relevant compiled script. So for a packaged MRtrix install, there won't be "data required for scripts": there's just one file for each script.

No, I completely get the appeal. I just don't think we've had any issues with using these scripts at all so far, either in terms of locating the binaries (which would still need to be dealt with) or locating external data files. So I'm not sure there's any need for this in general.

If you're talking about changing the directory structure of the scripts/ directory within the repository, that's a separate discussion to packaging.

As you can probably tell from my previous comment, that is exactly what I have in mind... But I don't see it as unrelated to packaging: I'm looking for a solution that is seamless between a packaged or native installation, so that there is no difference in how the code runs between what we use as developers and what our users will have on their systems.

But like I said, this may be a sledgehammer to crack a nut...

Lestropie commented 7 years ago

There's definitely some sense to it.

Let me just list concerns / thoughts as they come to me.

Maybe after tag 0.3.16 is merged, or maybe as part of that merge if people think it's worth going for straight away.

I'd rather not rush it into 0.3.16. There's already a lot of stuff being compiled for 0.3.16, and a lot of things being 'slipped in' at the last minute. Given this change would be about simplifying packaging / release for the sake of software release, it makes sense to me for it to go with a tag update that's intended as the 'release'.

scripts currently will always invoke the release version of the executables, making it difficult to run the scripts with debug/assert configs. Since the new build would be directly in the PATH, this would allow the scripts to use the different configs directly.

Keep in mind this is only a limitation currently because I haven't yet written the 5 lines of Python that would be required to make it happen. So don't let trying to achieve this affect decisions.

bin/: contains executables as compiled by the ./build script, and also invokable scripts.

My concern here would be users deleting the bin/ directory manually, thinking that everything would then be re-built, only to have lost the scripts completely. Yes it would clearly be advised against in favour of ./build clean, but you never know what people will do.

share/mrtrix3/: contains any data files that need to be accessible in scripts, but are not themselves to be executed (i.e. what is currently in scripts/data/). It might also be an idea to move the contents of the current src/connectome/tables/ folder to that location - but maybe there's a good reason for them to remain there, I'm not sure.

I've never been sure where to put the contents of src/connectome/tables; it started in src/dwi/tractography/connectomics/example_configs. Being away from source code would be preferable.

Slightly similar to previous point: Grouping this with what's currently in scripts/data would be mixing&matching data that's used by scripts in an automated fashion, and data that are intended for users to interact with. Maybe putting them in sub-directories would be adequate to resolve that though.

Note that this implies that the MRtrix3 modules are now found in mrtrix3 rather than lib

:+1:

The sys.path.append line isn't suited for running scripts that are built against the MRtrix3 Python libraries, but are stored elsewhere. Setting PYTHONPATH is how this is achieved currently. Should still work, but would need to check that the sys.path.append line doesn't cause an error if the path doesn't exist; or just make the compulsory cut&paste 'library search formula' a little more complex.

Switching between configs

If compressing & extracting files when changing configurations changes timestamps, then the build script will rebuild everything next time it's invoked. Only reason I use parallel configurations is to be able to rebuild a command quickly in both debug and release mode given a small source code change. If switching configurations in this framework would mean everything would need rebuilding even for a tiny source file change, the whole parallel configurations thing loses its purpose (to me).

I could live without parallel configurations if it would simplify getting this all to work.

jdtournier commented 7 years ago

I'd rather not rush it into 0.3.16.

Agree, it's a bit much to dump on users in one go. But then, tag_0.3.16 brings in quite a few changes in operation too, so in a way introducing a major change in the layout of the repo will at least reinforce to users that things really are different, and that they really don't want to update with this mid-study or something.

Not that I'm adamant now is the time to do this by any means, I'm in two minds about this. You're right that at least with a major release, people can reasonably expect a few changes... Happy either way, to be honest.

scripts currently will always invoke the release version of the executables, making it difficult to run the scripts with debug/assert configs. Since the new build would be directly in the PATH, this would allow the scripts to use the different configs directly.

Keep in mind this is only a limitation currently because I haven't yet written the 5 lines of Python that would be required to make it happen. So don't let trying to achieve this affect decisions.

True, there's other solutions. But the config swap solution is more elegant in my opinion, you can literally run the exact same command after swapping, which I think will reduce the potential for screw-ups. But really the main motivation is to maintain consistency between native and packaged installs: I really think there is value is ensuring we run the code the same way our users do, so that we're likely to encounter issues before they do. Also, packaging then becomes a simple matter of tar cfj mrtrix3.tar.bz2 bin/ lib/ share/ :+1:

bin/: contains executables as compiled by the ./build script, and also invokable scripts.

My concern here would be users deleting the bin/ directory manually, thinking that everything would then be re-built, only to have lost the scripts completely.

OK, but realistically this is a slightly paranoid scenario... I don't expect any of our users would want to delete anything - in my experience, the less savvy users are very reluctant to do anything drastic for fear of messing up their system, and having to delete anything will send them into a cold sweat. The more experienced users will be able to recover with a simple git reset --hard HEAD or a fresh clone, and quickly learn to use ./build clean instead...

Slightly similar to previous point: Grouping this with what's currently in scripts/data would be mixing&matching data that's used by scripts in an automated fashion, and data that are intended for users to interact with. Maybe putting them in sub-directories would be adequate to resolve that though.

That's also just fine. You'll find the /usr/share/ folder contains a lot of documentation in /usr/share/doc (often just as html), and Qt stores their example code in /usr/share/qt5/examples/, etc. Both uses are equally valid here.

The sys.path.append line isn't suited for running scripts that are built against the MRtrix3 Python libraries, but are stored elsewhere. Setting PYTHONPATH is how this is achieved currently. Should still work, but would need to check that the sys.path.append line doesn't cause an error if the path doesn't exist; or just make the compulsory cut&paste 'library search formula' a little more complex.

Good point. But then I'd probably argue that if the path doesn't exist, that should be an error: there's no guarantee that the script will be using the correct version of its modules - if it even runs.

As to which way is better, this relates to a discussion I had with @dchristiaens recently: the problem here is that the MRtrix3 python module as it stands is really not intended to be a regular module as such - there's no expectation at this stage that it would be distributed and installed on users' systems as a standalone package, it needs to have the MRtrix3 commands in a known location to operate. So from that point of view, I don't see that it should be treated like a regular python module. Unless we do decide to really invest some effort into making it more readily usable for python users - but I'd argue that the code required for a regular python module (i.e. one that allows you to interact with the data within python directly) would probably be completely independent from what we have now.

In any case, it looks like there's ways to deal with this, so as long as one of them is acceptable, that's fine.

If compressing & extracting files when changing configurations changes timestamps, then the build script will rebuild everything next time it's invoked. Only reason I use parallel configurations is to be able to rebuild a command quickly in both debug and release mode given a small source code change. If switching configurations in this framework would mean everything would need rebuilding even for a tiny source file change, the whole parallel configurations thing loses its purpose (to me).

Absolutely, that's the whole point. And tar explicitly does preserve timestamps, which is the only reason I've been contemplating it. But it probably doesn't matter anyway - I've had a go this on my train rides over the last week, and I decided it's just easier to move the files over to an appropriately named folder, and also a lot quicker. I'll make a pull request for it so you can all take a look, see what you think of the idea...

Lestropie commented 7 years ago

The sys.path.append line isn't suited for running scripts that are built against the MRtrix3 Python libraries, but are stored elsewhere. Setting PYTHONPATH is how this is achieved currently. Should still work, but would need to check that the sys.path.append line doesn't cause an error if the path doesn't exist; or just make the compulsory cut&paste 'library search formula' a little more complex.

Good point. But then I'd probably argue that if the path doesn't exist, that should be an error: there's no guarantee that the script will be using the correct version of its modules - if it even runs.

I use this trick for my own development scripts that sit in a DropBox directory. Really all it should need is a code comment instruction to not use that line if developing a script that isn't guaranteed to reside in the MRtrix3 directory (and you are hence relying on setting PYTHONPATH in order to run it).

Any script developed residing outside of MRtrix3 is without 'version' by design; so if the module version mismatch isn't bad enough to generate an error, any additional issues are purely up to the dev.

As to which way is better, this relates to a discussion I had with @dchristiaens recently: the problem here is that the MRtrix3 python module as it stands is really not intended to be a regular module as such - there's no expectation at this stage that it would be distributed and installed on users' systems as a standalone package, it needs to have the MRtrix3 commands in a known location to operate.

Only because of the particulars of how the script->binary version matching is implemented right now. Previously, or with a small amount of code change, it would only require that the MRtrix3 binaries be in PATH. Nevertheless, no it's not intended for separate distribution, and I have no intention of adding image format handlers etc. natively in Python.

But:

So from that point of view, I don't see that it should be treated like a regular python module.

Define "regular Python module"? Or at least the purpose behind this point?

jdtournier commented 7 years ago

About supporting scripts that reside outside of the main install: sorry, yes I fully agree that PYTHONPATH is the right way to handle this. But I also wouldn't expect these scripts to contain those lines. Note that it's not an issue for our imported modules (i.e. what's currently in the scripts/lib folder), since they don't need to rely on anything like this: any bits they might subsequently need to import will be in the other files (i.e. modules) sitting within the same folder (i.e. package), so python will locate them trivially using its standard methods. Means external scripts really have nothing to worry about, that trick just doesn't affect them - unless they decide to add this line.

As to what I mean by 'regular python module': if you look online into how you're supposed to structure your module, you'll find a lot of rather strong opinions on the matter, and they're pretty much all about how to structure it so that it's compatible with pip, works well as a distributable library, etc. The recommendations are generally to not do anything like what we're talking about here: they really want you to install to a standard location, and/or use PYTHONPATH. My point is that our scripts are not of this nature. We're not planning on shipping them outside of MRtrix3, and the scripts themselves are inherently tied to the specific version of the module they rely on, and often to the specific version of the binary executables too.

So that comment was really just saying, let's not just follow the online advice here, what we're doing just doesn't fit into that paradigm.

jdtournier commented 7 years ago

About supporting scripts that reside outside of the main install: sorry, yes I fully agree that PYTHONPATH is the right way to handle this

I should add: also agree that a short comment about this in our scripts is a good idea.

maedoc commented 7 years ago

Just to add a data point here in case useful for other HPC admins; I maintain the mrtrix versions on a local HPC cluster, and they are just git clones with a tag or hash checked out, built as in the docs, and I add a simple activate file of form

source /soft/gcc492/activate
export MRT3=/soft/mrtrix3-0.3.15
export PATH=$MRT3/scripts:$MRT3/release/bin:$PATH

and our users do

source /soft/mrtrix3-0.3.15

and it works fine for us and seems more robust than trying to do any sort of install process into a $PREFIX.

jdtournier commented 7 years ago

Just to tie this off:

As of tag 3.0_RC1, all executables (whethe scripts or binaries) live in the top-level bin/ folder. The only things required to make use of MRtrix3 at this point is to add that folder to the PATH - everything else works itself out.

Also, if you want to install into a prefix, while the configure and build scripts don't (yet) offer this functionality, it is now much simpler than it would have been in the past: you just need to copy the bin/, lib/ and share/ folder over to your target system.

Also, we now build for a architecture - the -march flag is not used by default (-march=native used to be the default). This should prevent other issues related to illegal instructions when running binaries across different CPU generations.

This is now documented in the docs - although it could be with being highlighted in the HPC section...

I'll close this for now, I think the changes we made address the bulk of the issues raised in this thread.

zeraiiabderrazek commented 5 years ago

I have been trying to download MRTRIX3 on ubuntu 18.04, everything is fine until I get to ./build, always regarding initialiser_helpers.o : ERROR: (1/5) [LB] bin/mrclusterstats

g++ tmp/src/stats/tfce.o tmp/src/stats/permtest.o tmp/src/stats/permstack.o tmp/src/stats/cluster.o tmp/src/exec_version.o tmp/src/dwi/directions/predefined.o tmp/cmd/mrclusterstats.o -lmrtrix -Wl,--sort-common,--as-needed -pthread -lz -ltiff -lfftw3 -Wl,-rpath,$ORIGIN/../lib -L./lib -o bin/mrclusterstats

failed with output

tmp/src/stats/cluster.o: fichier non reconnu: Fichier tronqu collect2: error: ld returned 1 exit status

ERROR: (3/5) [LB] bin/fod2fixel

g++ tmp/src/dwi/directions/predefined.o tmp/cmd/fod2fixel.o tmp/src/exec_version.o tmp/src/dwi/directions/mask.o tmp/src/dwi/directions/set.o tmp/src/dwi/fmls.o -lmrtrix -Wl,--sort-common,--as-needed -pthread -lz -ltiff -lfftw3 -Wl,-rpath,$ORIGIN/../lib -L./lib -o bin/fod2fixel

failed with output

tmp/cmd/fod2fixel.o: fichier non reconnu: Fichier tronqu collect2: error: ld returned 1 exit status

ERROR: (2/5) [LB] bin/mrview Help greatly appreciated!

jdtournier commented 5 years ago

Sounds to me like you ran out of storage - at least assuming my interpretation of 'fichier tronqué' as 'file truncated' is correct...

On a different note: it's best to start a different thread if posting a new question unrelated to the original issue. Moreover, this is probably the kind of issue that would best be addressed at the MRtrix community forum - we tend to discuss issues related to code design and development on GitHub. Not a big deal, just pointing it out for future reference... :+1: