Allow splitting C++ commands into separate files

daljit46 commented 3 months ago

We can do this for Python, but not for C++. This wouldn't merely improve consistency, but potentially reduce compilation times by increasing build parallelism and also allow developer to better reason about their code with enhanced readability.

Lestropie commented 3 months ago

Technically it's already possible just by moving code from a cmd/*.cpp file into "somewhere appropriate" in src/. The main topics for this Issue are I think:

Where is suitable to put such code. If there's a plausible chance of a piece of code being applicable to a hypothetical command other than the one for which the code is written, then it should be possible to determine a suitable filesystem location & namespace in (currently) src/ to make it discoverable. What we don't have a precedent for is where to place code that is absolutely only applicable to one command and one command only. I think that our prior design philosophy encourages minimisation of the latter; but that's not to say that it might be applicable in a few scenarios.
Which existing commands warrant refactoring. Sorting cmd/*.cpp by line count, > 500 lines is:
1. mrregister: There's already src/registration/, maybe it just needs more functionality to be moved there.
2. mrcalc: I'd prefer that much of the code here be generalised such that stack-based arbitrary calculations can be performed on other types of data; eg. #339.
3. tckconvert: Much will be moved into src/dwi/tractography/ by #411.
4. mrtransform
5. tckmap: src/dwi/tractography/mapping/ already exists, might just need to move some more functionality there.
6. mtnormalise: At least the fitting of a 3D polynomial to an image could be functionalised and might be useful elsewhere.
7. tensor2metric: Possibly src/dwi/tensor.* should become a src/dwi/tensor/* namespace.
8. fixel2voxel: #2657 could likely move this into core/fixel/.
9. mrconvert: Can see how splitting functions into different files might help here.
10. fixelcfestats
11. mrmath
Whether to allow the core API functions usage() and run() to be split across different files. Not sure if this might require changes to compilation? Or I suppose could just require that cmd/cmdname.cpp define usage() and run() and each could just be a one-liner that calls eg. MR::cmdname::usage() / MR::cmdname::run().

daljit46 commented 3 months ago

Technically it's already possible just by moving code from a cmd/*.cpp file into "somewhere appropriate" in src/.

This is true, but I think the requirement for code to be put in src may be too strong IMO. Suppose for example, that a command contains a function veryLong(), then the developer may want to split this into three separate functions short1(), short2() and short3(). These short functions could possibly reside in three different files (with let's say template instantiations happening in each translation unit). Having those functions abstracted away in src would make very little sense as they're irrelevant to any other commands, but they are purely there for better code structure. An example, where this may be relevant would be mrregister where various types of image registration are performed causing an excessive amount of template instantiations to happen inside the translation unit of mrregister.cpp file.

I think allowing for a folder with a flat structure (no subdirectories) per command may be beneficial. For example, inside mrregister, one may have mrregister.cpp,other1.cpp,other2.cpp,

Lestropie commented 3 months ago

In the case where code is very much specific to one command only (and is strongly expected to remain so in the future), your proposal is:

cmd/
    mrshortcmd.cpp
    mrnotshortcmd/
        function1.cpp
        function2.cpp
        mrnotshortcmd.cpp

?

That would bear certain resemblance to Python in #2920. I'd be happy with that, with the caveat that it would be nice if CMake would figure it out itself on a per-command basis.

Indeed, maximising similarity with #2920 would be:

cmd/
    mrshortcmd.cpp
    mrmediumcmd/
        function1.cpp
        function2.cpp
        mrmediumcmd.cpp
    mrlongcmd/
        functionA.cpp
        functionB.cpp
        functionC.cpp
        functionD.cpp
        usage.cpp
        run.cpp

I don't know if there's adequate motivation for such. With C++ we tend to use option groups and parsing functions that get buried into header files, which keeps usage() fairly short. Conversely, if those option groups are in fact command-specific, maybe they should be in cmd/mrnotshortcmd/ rather than src/, in which case mrlongcmd/ might have some benefit over mrmediumcmd/. With Python, some commands (population_template in particular) can have very long usage() functions, so having usage() and execute() (couldn't use run() in Python) in separate files I deemed beneficial. Of course you could instead just have a population_template.py that has:

def usage():
    return population_template.usage()
def run():
    return population_template.run()

So the question is two-tiered:

Should C++ permit all three structures?
Should Python only permit two of the three structures (ie. permit mrshortcmd & mrmediumcmd, remove mrlongcmd)?

daljit46 commented 3 months ago

That would bear certain resemblance to Python in https://github.com/MRtrix3/mrtrix3/pull/2920. I'd be happy with that, with the caveat that it would be nice if CMake would figure it out itself on a per-command basis.

Yes, this is what I had in mind. On CMake doing this automatically, I believe it would be fairly easy to do (all you need to do is compile the extra sources as a static/object library). I still think that we should consider getting rid of globbing though (although in practice, I have to admit that so far I haven't encountered any issues yet), but that's a separate discussion I guess.

Should C++ permit all three structures?

In my opinion, that would be unnecessary and overkill.

MRtrix3 / mrtrix3

Allow splitting C++ commands into separate files #2964