Closed yehman closed 1 year ago
BIDScoin is developed against standard DICOM data, so I take this as an opportunity to make it more robust against the wild clinical varieties out there :-). I'll try to see what I can do next week
SeriesNumber is a required field, but I can surely make dicomsort failures more gracefully. Have you been using the --force
option?
https://dicom.innolitics.com/ciods/mr-image/general-series/00200011
You can run dicomsort before running bidsmapper/bidscoiner, and then you can use the --force
option (which I think does what you want). The downside (or advantage) is that the sorting is done permanently and not in a temp folder, is that ok for you?
Anyhow, i made a small patch to deal with missing seriesnumber
Thanks for venturing into the Wild West that is clinical data. Here's a bit more information which hopefully can help. I tried both dicomsort with --force and without, both will abort at the first encounter of the problematic series. Running bidsmapper with and without --force also shows the same behavior. The error message is the same in all instances:
INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2456 files) INFO | Creating: DICOM/sub-00001/ses-00001/017-Reformatted ERROR | Missing 'SeriesNumber' DICOM field specified in the '{SeriesNumber:03d}-{SeriesDescription}' folder/naming scheme, cannot find a safe name for: DICOM/sub-00001/ses-00001/00001.Seq17.Ser0.Img88.dcm
ERROR | Cannot create subfolders, aborting dicomsort()... INFO | >> Sorting: DICOM/sub-00001/ses-00001_PET (442 files)
The series in question is a secondary derived image (I think a PET overlaid on MRI). I would tell bidsmapper to ignore this series, so having dicomsort fail at this series is not a major issue. The problem is that all following files in the same session would also be skipped. (Note that sequence number is retained in the filename, which made it possible to extract in my hack.) I also tried sorting with only SeriesDescription, but ended up with multiple images sorted into one folder. In the end, I had to modify dicomsort.py to replace the missing SeriesNumber with the sequence number grabbed from the filename.
Updating with the new patch generates a new error:
INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2456 files) INFO | Creating: DICOM/sub-00001/ses-00001/017-Reformatted Traceback (most recent call last): File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/bin/dicomsort", line 8, in
sys.exit(main()) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 241, in main sortsessions(sourcefolder = args.dicomsource, File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 198, in sortsessions sessions += sortsessions(sessionfolder, folderscheme=folderscheme, namescheme=namescheme, pattern=pattern, recursive=recursive, dryrun=dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 208, in sortsessions sortsession(sourcefolder, dicomfiles, folderscheme, namescheme, force, dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 116, in sortsession subfolder = construct_name(folderscheme, dicomfile, force) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 51, in construct_name return scheme.format(**schemevalues) if schemevalues else '' ValueError: Unknown format code 'd' for object of type 'str'
And does it work now?
There's a small spelling error on line 233 that generates this:
ERROR | Bad naming scheme: {SeriesNumber:03g}-{SeriesDescription}. Only alphanumeric characters could be used for the field names (with the optional number of digits afterwards,e.g. '{InstanceNumber:05d}'), and only alphanumeric characters, dots, and dashes + underscores could be used as separators. ERROR | Wrong scheme input argument(s), aborting dicomsort()...
Correcting this does indeed make dicomsort work when Series Number is missing! 👍
Oops, sorry for being my software tester :-p. I'll patch it right away
Well, I'm the one with funky data, so it makes sense for me to test.
Latest update yields this:
INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2456 files) Traceback (most recent call last): File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/bin/dicomsort", line 8, in
sys.exit(main()) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 243, in main sortsessions(sourcefolder = args.dicomsource, File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 200, in sortsessions sessions += sortsessions(sessionfolder, folderscheme=folderscheme, namescheme=namescheme, pattern=pattern, recursive=recursive, dryrun=dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 210, in sortsessions sortsession(sourcefolder, dicomfiles, folderscheme, namescheme, force, dryrun) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 118, in sortsession subfolder = construct_name(folderscheme, dicomfile, force) File "/home/syeh/tools/fsl/fslpython/envs/bidscoin2/lib/python3.8/site-packages/bidscoin/utilities/dicomsort.py", line 53, in construct_name return scheme.format(**schemevalues) if schemevalues else '' KeyError: 'SeriesNumber'
Aha, there was another place where the new format specifiers were used. Hope it is finally solved now :-)
It does run now without errors. Unfortunately, I'm getting a new unintended effect: Multiple sequences are being placed in the same directory. It looks like when the UID string is too long when converting to int, they get formatted into scientific numeric notations.
INFO | Searching for subject/session folders in: DICOM INFO | >> Sorting: DICOM/sub-00001/ses-00001 (2478 files) INFO | Creating: DICOM/sub-00001/ses-00001/017-Reformatted INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019 INFO | Creating: DICOM/sub-00001/ses-00001/009-DTI 128 res 64 dir 2x2x2 INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019 INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019 INFO | Creating: DICOM/sub-00001/ses-00001/016-SAG REFORMAT POST INFO | Creating: DICOM/sub-00001/ses-00001/1.2826e+51-PET 11-26-2019
I end up getting multiple sequences being sorted into the 1.2826e+51-PET 11-26-2019 folder because they are not unique enough.
Are you sure you are using the latest code? Because I reverted the scientific notation with this patch: https://github.com/Donders-Institute/bidscoin/commit/e3805e30b856325084d004678703fafe979d3992
pip install --force-reinstall --no-deps git+https://github.com/Donders-Institute/bidscoin
You're right! I was using https://github.com/Donders-Institute/bidscoin/commit/36973a18525e18e2a2d13f23b46e4ca1ab3628e2 instead of https://github.com/Donders-Institute/bidscoin/commit/e3805e30b856325084d004678703fafe979d3992. Updating to the latter does correctly separate the three sequences with missing SeriesNumber into their respective folders. I will close this issue. Thanks for working on this issue so promptly, and also for creating bidscoin! 👍
Hi, I'm using bidscoin on a large set of clinical data. Bidscoin is great for this, as I do not have to write each variation of scanning protocol in manually. As it is clinical data, no assumption on the DICOM fields are safe. Some scans in particular, have missing SeriesNumber. dcm2niix deals with this by making up a long sequence of number (from where I don't know) to replace the empty value. dicomsort from bidscoin, on the other hand, aborts prematurely, and skips the all the following files from that subject, even if they are from a different sequence with valid SeriesNumber (in a flat directory structure separated by sessions). I tried replacing SeriesNumber with others such as SeriesTime or AcquisitionTime, but they are missing on other scans. Finally, I hacked dicomsort.py to grab the sequence number from the filename which somehow worked. It will work for now, until they change the DICOM export format...
It would be helpful if dicomsort could either: