Donders-Institute / bidscoin

BIDScoin converts your source-level neuroimaging data to BIDS
https://bidscoin.readthedocs.io
GNU General Public License v3.0
130 stars 35 forks source link

dicomsort could fail more gracefully when SeriesNumber is missing #181

Closed yehman closed 1 year ago

yehman commented 1 year ago

Hi, I'm using bidscoin on a large set of clinical data. Bidscoin is great for this, as I do not have to write each variation of scanning protocol in manually. As it is clinical data, no assumption on the DICOM fields are safe. Some scans in particular, have missing SeriesNumber. dcm2niix deals with this by making up a long sequence of number (from where I don't know) to replace the empty value. dicomsort from bidscoin, on the other hand, aborts prematurely, and skips the all the following files from that subject, even if they are from a different sequence with valid SeriesNumber (in a flat directory structure separated by sessions). I tried replacing SeriesNumber with others such as SeriesTime or AcquisitionTime, but they are missing on other scans. Finally, I hacked dicomsort.py to grab the sequence number from the filename which somehow worked. It will work for now, until they change the DICOM export format...

It would be helpful if dicomsort could either:

  1. Instead of fully aborting for the rest of the subject's scans, skip only the files that had missing values
  2. When adding additional folderschemes, replace empty values. This way, when values are missing, uniqueness can still be attempted via a combination of multiple fields. e.g., {SeriesNumber}-{SeriesTime}-{AcquisitionTime}-{SeriesDescription} can return 3-NONE-NONE-t1_mprage, or something similar. Current behavior is aborting whenever any one is missing a value.
  3. Somehow use the non-random number dcm2niix generates for missing values.
marcelzwiers commented 1 year ago

BIDScoin is developed against standard DICOM data, so I take this as an opportunity to make it more robust against the wild clinical varieties out there :-). I'll try to see what I can do next week

marcelzwiers commented 1 year ago

SeriesNumber is a required field, but I can surely make dicomsort failures more gracefully. Have you been using the --force option? https://dicom.innolitics.com/ciods/mr-image/general-series/00200011

marcelzwiers commented 1 year ago

You can run dicomsort before running bidsmapper/bidscoiner, and then you can use the --force option (which I think does what you want). The downside (or advantage) is that the sorting is done permanently and not in a temp folder, is that ok for you?

marcelzwiers commented 1 year ago

Anyhow, i made a small patch to deal with missing seriesnumber

yehman commented 1 year ago

Thanks for venturing into the Wild West that is clinical data. Here's a bit more information which hopefully can help. I tried both dicomsort with --force and without, both will abort at the first encounter of the problematic series. Running bidsmapper with and without --force also shows the same behavior. The error message is the same in all instances:

INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2456 files) INFO | Creating: DICOM/sub-00001/ses-00001/017-Reformatted ERROR | Missing 'SeriesNumber' DICOM field specified in the '{SeriesNumber:03d}-{SeriesDescription}' folder/naming scheme, cannot find a safe name for: DICOM/sub-00001/ses-00001/00001.Seq17.Ser0.Img88.dcm

ERROR | Cannot create subfolders, aborting dicomsort()... INFO | >> Sorting: DICOM/sub-00001/ses-00001_PET (442 files)

The series in question is a secondary derived image (I think a PET overlaid on MRI). I would tell bidsmapper to ignore this series, so having dicomsort fail at this series is not a major issue. The problem is that all following files in the same session would also be skipped. (Note that sequence number is retained in the filename, which made it possible to extract in my hack.) I also tried sorting with only SeriesDescription, but ended up with multiple images sorted into one folder. In the end, I had to modify dicomsort.py to replace the missing SeriesNumber with the sequence number grabbed from the filename.


Updating with the new patch generates a new error:

INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2456 files) INFO | Creating: DICOM/sub-00001/ses-00001/017-Reformatted Traceback (most recent call last): File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/bin/dicomsort", line 8, in sys.exit(main()) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 241, in main sortsessions(sourcefolder = args.dicomsource, File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 198, in sortsessions sessions += sortsessions(sessionfolder, folderscheme=folderscheme, namescheme=namescheme, pattern=pattern, recursive=recursive, dryrun=dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 208, in sortsessions sortsession(sourcefolder, dicomfiles, folderscheme, namescheme, force, dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 116, in sortsession subfolder = construct_name(folderscheme, dicomfile, force) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 51, in construct_name return scheme.format(**schemevalues) if schemevalues else '' ValueError: Unknown format code 'd' for object of type 'str'

marcelzwiers commented 1 year ago

And does it work now?

yehman commented 1 year ago

There's a small spelling error on line 233 that generates this:

ERROR | Bad naming scheme: {SeriesNumber:03g}-{SeriesDescription}. Only alphanumeric characters could be used for the field names (with the optional number of digits afterwards,e.g. '{InstanceNumber:05d}'), and only alphanumeric characters, dots, and dashes + underscores could be used as separators. ERROR | Wrong scheme input argument(s), aborting dicomsort()...

Correcting this does indeed make dicomsort work when Series Number is missing! 👍

marcelzwiers commented 1 year ago

Oops, sorry for being my software tester :-p. I'll patch it right away

yehman commented 1 year ago

Well, I'm the one with funky data, so it makes sense for me to test.

Latest update yields this:

INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2456 files) Traceback (most recent call last): File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/bin/dicomsort", line 8, in sys.exit(main()) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 243, in main sortsessions(sourcefolder = args.dicomsource, File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 200, in sortsessions sessions += sortsessions(sessionfolder, folderscheme=folderscheme, namescheme=namescheme, pattern=pattern, recursive=recursive, dryrun=dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 210, in sortsessions sortsession(sourcefolder, dicomfiles, folderscheme, namescheme, force, dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 118, in sortsession subfolder = construct_name(folderscheme, dicomfile, force) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 53, in construct_name return scheme.format(**schemevalues) if schemevalues else '' KeyError: 'SeriesNumber'

marcelzwiers commented 1 year ago

Aha, there was another place where the new format specifiers were used. Hope it is finally solved now :-)

yehman commented 1 year ago

It does run now without errors. Unfortunately, I'm getting a new unintended effect: Multiple sequences are being placed in the same directory. It looks like when the UID string is too long when converting to int, they get formatted into scientific numeric notations.

INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2478 files) INFO | Creating: DICOM/sub-00001/ses-00001/017-Reformatted INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019 INFO | Creating: DICOM/sub-00001/ses-00001/009-DTI 128 res 64 dir 2x2x2 INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019 INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019 INFO | Creating: DICOM/sub-00001/ses-00001/016-SAG REFORMAT POST INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019

I end up getting multiple sequences being sorted into the 1.2826e+51-PET 11-26-2019 folder because they are not unique enough.

marcelzwiers commented 1 year ago

Are you sure you are using the latest code? Because I reverted the scientific notation with this patch: https://github.com/Donders-Institute/bidscoin/commit/e3805e30b856325084d004678703fafe979d3992

pip install --force-reinstall --no-deps git+https://github.com/Donders-Institute/bidscoin
yehman commented 1 year ago

You're right! I was using https://github.com/Donders-Institute/bidscoin/commit/36973a18525e18e2a2d13f23b46e4ca1ab3628e2 instead of https://github.com/Donders-Institute/bidscoin/commit/e3805e30b856325084d004678703fafe979d3992. Updating to the latter does correctly separate the three sequences with missing SeriesNumber into their respective folders. I will close this issue. Thanks for working on this issue so promptly, and also for creating bidscoin! 👍