MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
292 stars 180 forks source link

ENH Python external modules: Support for external packages that depend on mrtrix3 #2540

Open badesar1 opened 1 year ago

badesar1 commented 1 year ago

It seems that external python modules that depend on mrtrix3 cant be packaged with pip or conda, since these package managers call the main external module from a wrapper. This results in an error like:

  File "/usr/local/mrtrix3/lib/mrtrix3/__init__.py", line 91, in execute
    app._execute(inspect.getmodule(inspect.stack()[-1][0]))
  File "/usr/local/mrtrix3/lib/mrtrix3/app.py", line 122, in _execute
    module.usage(CMDLINE)
AttributeError: module '__main__' has no attribute 'usage'

Since the module stack now ends with the pip/conda wrapper rather than the main external python script/function.

A simple solution might be to modify mrtrix3/__init__.py to loop over elements of inspect.stack() until it finds the correct module to execute. Something like:

def execute():
  for i in range(len(inspect.stack())):
    try:
      app._execute(inspect.getmodule(inspect.stack()[i][0])) # pylint: disable=protected-access
    except:
      pass

This way, a developer creating an external module that includes the mrtrix3 python library as a dependency can package their module as usual, and after users install the package with pip, they can link the installed package to mrtrix3 using PYTHONPATH or ln -s .../mrtrix3.py .../pip-module-location/bin as usual.

Lestropie commented 1 year ago

Hi @badesar1,

While I'd like for more people to use the MRtrix3 Python API / modules, this is actually beyond my own knowledge of Python / pip / conda. So contribution in this regard from someone getting their hands dirty in that respect would be very much appreciated.

Would you be able to print the full stack as it appears within __init__.execute()? Want to wrap my head around how it differs from the environments in which I've experimented with this calling interface to determine what the most appropriate logic might be.

Also since you mentioned it, I'm curious to know if eg. Conda provides a mechanism by which installation of the mrtrix3 package would register its own lib/ sub-directory to be a part of sys.path, such that installation of an external module depending on such within the same environment would not require explicit use of PYTHONPATH or softlinking mrtrix3.py?

Cheers Rob

badesar1 commented 1 year ago

Hi Rob,

I am very glad to help out in regards to helping package external modules. While I am not an expert with pip / conda, I have built packages using setuptools and setup.py scripts in the past.

Below is the output of print(inspect.stack()) from within __init__.execute(). As you can see, I am calling the mrtrix3 module from a local installation located at /usr/local/mrtrix3/lib. I packaged the external module named designer using pip and setuptools, its executable was copied to /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/bin/designer.py, and this executable is called by a wrapper at /Library/Frameworks/Python.framework/Versions/3.10/bin/designer.

[FrameInfo(frame=<frame at 0x102cd8440,
                  file '/usr/local/mrtrix3/lib/mrtrix3/__init__.py',
                  line 91,
                  code execute>,
           filename='/usr/local/mrtrix3/lib/mrtrix3/__init__.py',
           lineno=91,
           function='execute',
           code_context=['  print(inspect.stack())\n'],
           index=0),
 FrameInfo(frame=<frame at 0x115edc5a0,
                  file '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/bin/designer.py',
                  line 140,
                  code main>, 
           filename='/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/bin/designer.py',
           lineno=140,
           function='main',
           code_context=['    mrtrix3.execute() #pylint: disable=no-member\n'],
           index=0),
 FrameInfo(frame=<frame at 0x102b55a40,
                  file '/Library/Frameworks/Python.framework/Versions/3.10/bin/designer',
                  line 8,
                  code <module>>,
           filename='/Library/Frameworks/Python.framework/Versions/3.10/bin/designer',
           lineno=8,
           function='<module>',
           code_context=['    sys.exit(main())\n'],
           index=0)]

Based on this, the correct call to _execute would be app._execute(inspect.getmodule(inspect.stack()[-2][0])). Of course, hard coding the index to the main external module this way would not be the best practice.

To answer your other question, yes, you can include a setup.py file (https://pythonhosted.org/an_example_pypi_project/setuptools.html) to run upon mrtrix3 installation (python setup.py install), which would compile the python libraries in lib/ and copy them to the local sys.path. Such a script would be very simple. It would be located in mrtrix3/lib/setup.py and could be something like the following:

from setuptools import setup, find_packages

setup(
        name ='mrtrix3-setup-example',
        version ='0.0.0',
        author ='Ben',
        author_email ='email@email',
        url ='https://github.com/mrtrix3/',
        description ='setup example',
        long_description = 'long description',
        long_description_content_type ="text/markdown",
        license ='MIT',
        packages = find_packages(),
        include_package_data=True,
        classifiers = [
            "Programming Language :: Python :: 3",
            "License :: OSI Approved :: MIT License",
            "Operating System :: OS Independent",
        ],
        keywords ='mrtrix3 MRI',
        zip_safe = False
)

Hope this is helpful!

Best, Ben

Lestropie commented 1 year ago

A standard usage of an in-built command looks like:

[FrameInfo(frame=<frame at 0x7f8536adbc40,
                  file '/home/rob/src/mrtrix3/lib/mrtrix3/__init__.py',
                  line 87,
                  code execute>,
           filename='/home/rob/src/mrtrix3/lib/mrtrix3/__init__.py',
           lineno=87,
           function='execute',
           code_context=['  print(inspect.stack())\n'], index=0),
 FrameInfo(frame=<frame at 0x7f8536ad9a40,
                  file '/home/rob/src/mrtrix3/bin/dwi2mask',
                  line 105,
                  code <module>>,
           filename='/home/rob/src/mrtrix3/bin/dwi2mask',
           lineno=105,
           function='<module>',
           code_context=['mrtrix3.execute() #pylint: disable=no-member\n'],
           index=0)]

So it seems to me that the change in #2475 would resolve this also. That was merged to dev, but if that proves to be a "bug fix", we can cherry-pick it to master. I just hope that there aren't any use cases where indexing from the start of the stack breaks things.

PS. I don't know the extent to which this approach completely violates Python norms, or whether there's an established solution to this kind of interface construction. I don't have a Python background prior to constructing this API...


The setup.py approach looks simple enough. Is the setuptools module still the accepted best solution for this sort of thing?

Also, if you're interested, you could propose the addition yourself and get attribution for it?

badesar1 commented 1 year ago

You are totally right,

I switched to the dev branch and the change you referenced does indeed fix the original issue I was having. I was able to package my external module with pip and link it to mrtrix3.lib without an error. I used the PYTHONPATH option, but the mrtrix3.py symbolic link to site-packages where my module was installed works as well.

This approach to handling the python API is unconventional, but I am not sure that an established solution to linking python to a separately compiled package really exists. From what I have seen, the most common approach to creating a python toolkit such as this is to actually repackage the entire distribution to be compiled using setuptools (for example, how the ANTs package is wrapped in ANTsPy (https://github.com/ANTsX/ANTsPy)).

The setuptools module is the current accepted way to package modules for pip. Of course, there are a ton of packaging solutions in python. However, I believe that other package managers such as conda also allow users to specify a setup.py file for installation as well using setuptools.

I took a crack at generating setup.py for mrtrix3.lib, but I am getting held up by the need for an absolute path to the currently installed binaries: BIN_PATH = os.path.abspath(os.path.join(os.path.abspath(os.path.dirname(os.path.abspath(__file__))), os.pardir, os.pardir, 'bin')). This line needs a minor change for compatibility with setuptools. It happens because the directory structure in site-packages where .egg packages are installed is different from the mrtrix3 directory structure and the binaries will no longer be findable with that line.

I'll let you know if I come up with a setup that doesn't require any changes to __init__.py.

Best, Ben

Lestropie commented 1 year ago

for example, how the ANTs package is wrapped in ANTsPy (https://github.com/ANTsX/ANTsPy).

That looks more like Python wrappers for C++ binaries so that a full experiment can be performed within Python using native function calls. With the MRtrix3 python API I knew there would be scripts that would be dependent on non-MRtrix3 functions, hence the more general run.command() subprocess approach. It also means there's no comparatively slow Python-based image data handling.

I am getting held up by the need for an absolute path to the currently installed binaries

Yeah, unsurprising that something like that would require refinement once external package management gets involved.

I'll let you know if I come up with a setup that doesn't require any changes to init.py.

I wholly expect that getting this to work will require changes to __init__.py. The trick will be figuring out the appropriate logic such that the library can find the directory containing just the MRtrix3 binaries, regardless of whether operating from source or through pip. There's no issue with proposing a change that not only adds setup.py but that also involves modification of that file.

badesar1 commented 1 year ago

In that case, I put together a simple solution that will still allow users to run the mrtrix3 python API as they were previously (with PYTHONPATH for example), or alternatively using setup.py install to directly add the API to their python environment. The setup.py file will live in mrtrix3/lib and look like this:

from setuptools import setup, find_packages
import os

long_description = '''
Python setup.py installation for external mrtrix3 modules.
'''

bin_path = [os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)), os.path.pardir, 'bin'))]
bin_path_txt = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'bin_path.txt')
open(bin_path_txt, 'w').writelines(bin_path)

setup(
        name ='mrtrix3',
        version ='3.1.0',
        author ='Benjamin Ades-Aron',
        author_email ='benjamin.ades-aron@nyulangone.org',
        url ='https://github.com/MRtrix3/mrtrix3/tree/dev',
        description ='Setup mrtrix3 libs',
        long_description = long_description,
        long_description_content_type ="text/markdown",
        license ='Mozilla Public License 2.0',
        packages = find_packages(),
        data_files = [('.', ['bin_path.txt'])],
        include_package_data=True,
        classifiers = [
            "Programming Language :: Python :: 3",
            "License :: OSI Approved :: Mozilla Public License",
            "Operating System :: OS Independent",
        ],
        keywords ='mrtrix3 MRI python',
        zip_safe = False
)

os.remove(bin_path_txt)

This would grab the path to the local mrtrix3 binaries upon installation so that they can be found once the API is copied to their python environment path. Obviously things like the long_description, license, author info would change according to mrtrix3 standards. It would require the following change to __init.py:

# Location of binaries that belong to the same MRtrix3 installation as the Python library being invoked
try:
  # first try reading the binary path from bin_path.txt
  BIN_PATH = open(os.path.join(os.path.abspath(os.path.dirname(os.path.abspath(__file__))), os.pardir, 'bin_path.txt'),'r').readlines()[0]
except:
  # otherwise read the binaries as usual
  BIN_PATH = os.path.abspath(os.path.join(os.path.abspath(os.path.dirname(os.path.abspath(__file__))), os.pardir, os.pardir, 'bin'))

If you agree with this solution I will go ahead and set up a pull request.

Lestropie commented 1 year ago

Interested in hearing from any other @MRtrix3/mrtrix3-devs who have more experience with Python outside of my own API, which I started having zero experience with Python...

Firstly, I want to note here an alternative that I thought of but is not a comprehensive solution. One could capture at setup time the list of MRtrix3 executables, and store this list to be read at execution time. However this would only identify which commands invoked are MRtrix3 commands, such that eg. command-line flags can be inserted automatically. What this would not provide is matching of MRtrix3 version between the Python API being used and any scripts or binaries being invoked. If that's considered important, then this approach isn't adequate.

Encapsulating just the Python API in this way is already separating the API from the binaries to an even greater extent than what we've encountered previously. What I don't know is whether that means the version matching is more or less important...

That thought process does raise two alternative prospects:

  1. Rather than a text file containing the absolute filesystem path being written to the filesystem, instead generate a softlink within the installed package that points to the location of the binaries corresponding to the MRtrix3 code that's being installed. __init__.py would then spot the softlink and make use of that to set BIN_PATH.

  2. Instead, should all of the binaries (both scripts and compiled C++) be being captured and copied in as part of the installed package?

badesar1 commented 1 year ago

I'd also be interested in hearing what other python devs think the best solution would be.

A softlink pointing to the location of the corresponding mrtrix3 binaries is totally doable, but may require a hacky solution to determine the path to the installed python libraries during setup. something like to a class just to create the softlink after installation finishes:

from setuptools import setup, find_packages
from setuptools.command.install import install
import atexit
import os, sys

class CustomInstall(install):
    def run(self):
        def _post_install():
            def find_module_path():
                for p in sys.path:
                    if os.path.isdir(p) and 'mrtrix3' in os.listdir(p):
                        return os.path.join(p, 'mrtirx3')
            install_path = find_module_path()
            bin_path = [os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)), os.path.pardir, 'bin'))]
            os.symlink(bin_path, os.path.join(install_path,'mrtrix3_bin'))

        atexit.register(_post_install)
        install.run(self)

s = setup(
    cdmclass={'install': CustomInstall},
    ...

Capturing and copying all of the scripts and compiled C++ binaries is likely the simplest method to solving this problem. But it comes with the issue of having multiple copies of the binaries on the filesystem. Would you worry about potential conflicts with that approach?

Lestropie commented 1 year ago

may require a hacky solution

From your code, it looks like there's at least some kind of mechanism within setuptools for doing stuff more clever than just providing text strings as function arguments? So don't know the extent to which this is actually a "hack" rather than intended usage.

it comes with the issue of having multiple copies of the binaries on the filesystem. Would you worry about potential conflicts with that approach?

This is actually the very problem that the relevant existing MRtrix3 code is intended to avoid.

Eg. As a developer I have many MRtrix3 installations on any given system, typically pointing at different branches (so that I don't have to recompile everything whenever I need to switch branches). Say I have both master and dev on my system, but the bin/ directory of only one of them is in my PATH. If I then execute one of the Python scripts from the other installation using an absolute path, I can't have that invoking the executables from PATH, since there may be some other incompatibility between the branches.

So if you want an MRtrix "installation" to be completely version-compatible, you would need to capture the binary executables, include them in the package, and have the run module explicitly use them when appropriate. The only case in which this would actually be a problem would be if that location were to be added to PATH, at which point you may be mistaken as to which file is being invoked during your own terminal usage.

If you were to do it using a softlink, then while that would link the Python API to the MRtrix3 binaries at the point of setup, it is entirely possible for you to then check out a different branch in your MRtrix3 code and recompile, breaking the version matching.


It does however seem to me that this solution is straying toward a "complete MRtrix installation" competing with the existing supported Conda installation, rather than being a solution for modules that want to make use of the Python API.

One could instead ask the question: If one is making use of that API, how critical is it that, if your external module code invokes an MRtrix3 command via run.command(), the version of that command invoked is precisely that matching the version of the code from which the Python package was created? Arguably, it's not compulsory. One could be using that API and never invoke an MRtrix3 command.

So key point here that I made above, but needs to be clearer for this discussion. Capturing the MRtrix3 executables location does two things when an MRtrix3 command is invoked:

  1. Ensure that it is the version of that command that exists within the same installation of MRtrix3 as the Python API currently executing (via run.version_match());

  2. Insert command-line options when invoking those commands (currently just -nthreads and -force).

If we were to transition both of these to achieve this purpose (within the Python API) using environment variables rather than command-line options, then technically the proposal here could completely omit identification of MRtrix3 binaries and version matching of such; it would be just the Python API. If an external module script were to execute MRtrix3 commands, it would be the responsibility of either an encapsulating container or a native installation present in PATH to provide those commands.


One other thing I thought of: If the proposed packaging were to be strictly an API for external module development and not encapsulate MRtrix3 commands, then the sub-directories of lib/mrtrix3/ shouldn't be included.

jdtournier commented 1 year ago

Hi all,

Sorry to wade into this so late in the day. I've been trying to wrap my head around the issue, given how involved this discussion is becoming. We have discussed these issues quite a bit over the years (e.g. #1613), and I thought we'd come up with a pretty sensible solution to the issue of ensuring the python code invoked the corresponding executables. I wouldn't go changing that in any way unless that was the only way to get things to work.

Having a look at the original reported problem though, I note that really the only problem is ensuring that the correct stack frame is used when fetching and invoking the corresponding module. The current code assumes that the outermost frame is the relevant one (last in the list), when in fact the relevant frame here should be the second in the list (parent of innermost). So the simplest fix here is to look for inspect.stack()[1], not inspect.stack()[-1] as is done in the code on master currently. Interestingly enough, that's exactly what the code does right now on the dev branch, following #2475 - as proposed by @maxpietsch for very similar reasons to those reported here.

@badesar1, does the simple change in #2475 resolve the issue for you...?

badesar1 commented 1 year ago

Hi Donald,

Yes, the change in #2475 does resolve the original issue. I switched to the dev branch and I am able to compile and install my external python module with pip and conda (I tested both). The only addition step I had to take was to add a _version.py file to the dev branch mrtrix3/lib, since that is currently missing. Therefore my original issue is completely resolved.

Rob and I wound up discussing the mechanism (in this case the python setuptools package) by which installation of the mrtrix3 package could register its own lib/ sub-directory to be a part of sys.path, such that installation of an external module depending on the mrtrix3 python API within the same environment would not require explicit use of PYTHONPATH or softlinking mrtrix3.py. However, I suppose this is a separate issue.

The proposed solution of including a setup.py file that users using the external python API could invoke would provide such a solution, however with this solution comes the issue of ensuring that once the python API is copied to the local sys.path, we must ensure that those copied libraries are linked to the correct version of mrtrix3 binaries. I think could be done pretty easily simply by invoking run.version_match() from the CustomInstall class described above.

Lestropie commented 1 year ago

The only addition step I had to take was to add a _version.py file to the dev branch mrtrix3/lib, since that is currently missing.

File lib/mrtrix3/_version.py is generated by build. Could consider trying to functionalise that for the sake of generation of this package (with the exception of if the MRtrix3 binaries were to be embedded, in which case running build would already be a prerequisite), or just state that build needs to be run before generating the package.

badesar1 commented 1 year ago

Ok that makes sense, I had skipped the build step because I was just testing the installation of the external module. I will add a statement to that effect to the package documentation. Do you think the above fix in #2475 will be added to the main branch at some point? I'm not sure how you schedule updates to master.

Lestropie commented 1 year ago

Do you think the above fix in https://github.com/MRtrix3/mrtrix3/pull/2475 will be added to the main branch at some point?

With the current structure it would only get to master at the point at which all of dev gets merged into master, which is the 3.1.0 update. If however it's considered a bug fix (which I think is the case), I need to explicitly cherry-pick that commit onto master.