dwifslpreproc: fault when input is a directory containing DICOM data

SWastling commented 1 year ago

Description

dwifslpreproc does not behave as expected when the input argument is a directory containing DICOM data containing multiple DICOM series.

Platform/Environment/Version

OS: Debian GNU/Linux 10 (buster)
MRtrix3 version: 3.0.4-10-gf633dfd7

Steps To Reproduce Script Hanging

If dcm is a directory containing multiple DICOM series then the command:

dwifslpreproc dcm dwi.mif -rpe_none -pe_dir AP

Initially prompts the user to select a series - the output on the command line is:

dwifslpreproc: 
dwifslpreproc: Note that this script makes use of commands / algorithms that have relevant articles for citation; INCLUDING FROM EXTERNAL SOFTWARE PACKAGES. Please consult the help page (-help option) for more information.
dwifslpreproc: 
mrinfo: [done] scanning folder "dcm" for DICOM data
Select series ('q' to abort):
   0 -   81 MR images 12:42:17 SiemensSMS_b2500_NODDI_DKI (*ep_b2500#22) [20] ORIGINAL PRIMARY DIFFUSION NONE ND MOSAIC
   1 -  216 MR images 12:46:19 SiemensSMS_b0_PA_NODDI_DKI (*ep_b0) [26] ORIGINAL PRIMARY DIFFUSION NONE ND

This prompt results from line 95 of mrtrix3/bin/dwifslpreproc:

image.check_3d_nonunity(path.from_user(app.ARGS.input, False))

If the user inputs 0 then the script proceeds before hanging. The last output is:

dwifslpreproc: Generated scratch directory: dwifslpreproc-tmp-RFJUOL/
Command:  mrconvert dcm dwifslpreproc-tmp-RFJUOL/dwi.mif -json_export dwifslpreproc-tmp-RFJUOL/dwi.json

Digging around a bit; it appears to hang because the outputs to the command line from select_cmdline.cpp are directed to stderr, and these are erroneously captured by run.command()

Note: The prompt for the user to select a series resulting from:

image.check_3d_nonunity(path.from_user(app.ARGS.input, False))

seems to make it to the command line because line 41 in the Header class in mrtrix3/lib/mrtrix3/image.py is:

result = subprocess.call(command, stdout=None, stderr=None)

i.e. subprocess.call is used directly rather than command.run()

Steps To Reproduce Inconsistent DICOM handling

To try to avoid the behavior described above I tried using the -info and/or -debug options. Then if dcm is a directory containing multiple DICOM series the command:

dwifslpreproc dcm dwi.mif -rpe_none -pe_dir AP -info

does not hang, however, the user is prompted to select the input series twice. The first time when line 95 of mrtrix3/bin/dwifslpreproc is run:

image.check_3d_nonunity(path.from_user(app.ARGS.input, False))

and the second time when line 185 of mrtrix3/bin/dwifslpreproc is run:

run.command('mrconvert ' + path.from_user(app.ARGS.input) + ' ' + path.to_scratch('dwi.mif') + grad_import_option + json_import_option + json_export_option)

Therefore the user can select two different series as they are prompted twice i.e. image.check_3d_nonunity() will run on the series selected first by the user, and then run.command('mrconvert ...') will run on the series selected second. So the series processed by dwifslpreproc may never have been checked by image.check_3d_nonunity(). This doesn't seem to me to be behaving as intended.

Related Information and Requests

I haven't checked thoroughly but I think some of the other python scripts in /mrtrix3/bin/ may also not behave as the user may expect when the input is a directory containing DICOM data.

Would it be possible to modify run.command() so that the output from mrconvert is visible to the user on the command line.

bjeurissen commented 1 year ago

Nice catch. I don't think many people have noticed this problem as most pipelines start with mrconvert DICOM ...

I haven't checked thoroughly but I think some of the other python scripts in /mrtrix3/bin/ may also not behave as the user may expect when the input is a directory containing DICOM data.

Any command written with the python API will have the hanging issue when presented with a DICOM folder with multiple series (confirmed).

Also, any command written with the python API, that references the input more than once, will have the second issue (being able to select different series at different steps). In order to fix this, the user input will have to either be remembered and fed correctly to subsequent calls, or repeated references to the input should be avoided by always starting with mrconvert and then running any subsequent steps on the result of that.

Note that the python-API-based commands are currently not 100% interchangeable with the standard commands, e.g., they cannot be used in pipes (#1392), and your example shows they also don't deal well with interactive prompts.

SWastling commented 1 year ago

I first came across the issue when trying to use the python API to write a pipeline that starts with DICOM data.

My initial work-around was to temporarily change the verbosity e.g.:

current_verbosity = app.VERBOSITY
if app.VERBOSITY < 2
    run.shared.set_verbosity(2)
run.command('mrconvert ...')
run.shared.set_verbosity(current_verbosity )

But this seems to be a rather horrible fudge!

Lestropie commented 1 year ago

Any script that reads from an input image more than once, eg. to check header information prior to importing (as this facilitates an early termination of the script if the user input is erroneous / incompatible) will potentially cause issues here; even if the user is appropriately presented with the series selection interface, the user could select two different series at different points in the code. Ie. This problem could be alleviated if all scripts were to just import data into the scratch directory and then perform checks there, but it would result in scratch space bloat, and be slower to provide user feedback.
Providing user accessibility to terminal UI, even if only for mrconvert (which would require an explicit code branch), is still not a complete solution, as scripts aren't guaranteed to access user input images only using the mrconvert command. It'd need to be across all of run.command(), which could cause all sorts of issues, and kind of defeats the design philosophy.

For the sake of "getting the command to work", my advice would be to use the DICOM environment variables to define what DICOM data to read prior to any MRtrix3 binaries being invoked. You can do this temporarily, eg. "$ DICOM_SERIES=xxx dwifslpreproc ...".

More generally, the issue applies to any case where the input to a Python script is a DICOM directory, and there is more than one series to choose from. Achieving this within the Python API would be quite clunky. Cleanest solution I can think of right now is:

Explicitly flag command-line arguments as input images where appropriate (#1392).
Identify cases where the input path is a directory, and will therefore be interpreted as a DICOM series.
In such cases, execute a new C++ binary command that instantiates the DICOM QuickScan capability, including the use of any DICOM-related environment variables, and returns information about patients / studies / studies; the Python API would then determine whether reading those data would invoke the user interface, at which point it could terminate with an appropriate error message. (Alternatively could have the existing dcminfo provide this behaviour if provided with a directory rather than a .dcm file as input)

MRtrix3 / mrtrix3

dwifslpreproc: fault when input is a directory containing DICOM data #2576