Open-Minds-Lab / mrQA

mrQA: tools for quality assurance in medical imaging datasets, including protocol compliance
https://open-minds-lab.github.io/mrQA/
Apache License 2.0
9 stars 6 forks source link

support some kind of `dicom-archives` style? #22

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

In heudiconv, and reproin heuristic in particular we not only convert to BIDS datasets, but also "archive" original DICOMs under the sourcedata/ in mirroring converted to BIDS data hierarchy. See e.g. https://datasets.datalad.org/?dir=/dbic/QA/sourcedata/sub-emmet/ses-20180531/fmap which accompanies nii.gz's in http://datasets.datalad.org/?dir=/dbic/QA/sub-emmet/ses-20180531/fmap .

Since bids analytics are limited to metadata fields extracted, I thought it would have been cool for mrQA to just operate on those original DICOMs which we have. (BTW heudiconv can convert from DICOMs being wrapped in such tarballs -- comes handy) . But I do not think that mrQA is supporting that as part of the dicom style:

bids@rolando:/inbox/BIDS/Wager/Wager/1076_spacetop$ ~/.local/bin/mr_proto_compl --data_root ./sourcedata/sub-0001/ses-01/ --output_dir .heudiconv/mrQA --style dicom
/home/bids/singularity_home/.local/lib/python3.9/site-packages/mrQA/cli.py:84: UserWarning: Expected a unique identifier for caching data. Got NoneType. Using a random name. Use --name flag for persistent metadata
  dataset = import_dataset(data_root=args.data_root,
Traceback (most recent call last):
  File "/home/bids/singularity_home/.local/bin/mr_proto_compl", line 8, in <module>
    sys.exit(main())
  File "/home/bids/singularity_home/.local/lib/python3.9/site-packages/mrQA/cli.py", line 84, in main
    dataset = import_dataset(data_root=args.data_root,
  File "/home/bids/singularity_home/.local/lib/python3.9/site-packages/MRdataset/base.py", line 82, in import_dataset
    dataset = dataset_class(
  File "/home/bids/singularity_home/.local/lib/python3.9/site-packages/MRdataset/dicom_dataset.py", line 70, in __init__
    self.save_dataset()
  File "/home/bids/singularity_home/.local/lib/python3.9/site-packages/MRdataset/base.py", line 351, in save_dataset
    raise EOFError('Dataset is empty!')
EOFError: Dataset is empty!
raamana commented 1 year ago

good pointer Yarik! My guess this is a matter of symantics in the best case e.g., finding the right root folder ./sourcedata/ instead of ./sourcedata/sub-0001/ses-01/, or it needs a subclass of MRdataset for this dataset format, which is easy to do as it was designed to allow for such customizations

if we can download an example dataset or two, we can easily test and fix this

yarikoptic commented 1 year ago
raamana commented 1 year ago

that reminds of me of whether we fixed the issue of our library getting stuck or having problems with with recursive symbolic links inside datalad. Probably not as I ended up making a hard copy of the dataset for our intern! Actually, if your time permits, helping us support datalad datasets would be a great contrib from your side. We can walk you through our internal setup and I am confident this is child's play for you :)

raamana commented 1 year ago

more specifically, it would be a minor modification of https://github.com/Open-Minds-Lab/MRdataset/blob/cfd26e38735665d80dbf9e052bf8adf70ed0b0bd/MRdataset/dicom_dataset.py#L77

after creating a class DicomArchiveDataset or ReproInDataset from

from MRdataset.base import Project
class DicomArchiveDataset(Project):

and some testing on a few examples. We are working on a deadline at the end of this week, but will look into it after.