bhklab / med-imagetools

Transparent and reproducible medical image processing pipelines in Python.
MIT License
35 stars 9 forks source link

Crawl portion of autopipeline misses CT and RTSTRUCT files #87

Open strixy16 opened 1 year ago

strixy16 commented 1 year ago

Running autopipeline /Users/katyscott/Documents/SARC021/images/ /Users/katyscott/Documents/SARC021/med-imageout/ --n_jobs 1 --update --overwrite doesn't find all of the CT and RTSTRUCT files in the images directory.

My images directory contains four directories total - one sample has two directories each. Each sample directory contains subdirectories containing CT and RTSTRUCTs as DICOMs. There are three different CT scans for each sample and RTSTRUCTs associated with most of them.

The output of the crawl only finds one of the three sets of CT and RTSTRUCT combinations for the first sample and two of the three CTs and one RTSTRUCT set for the second sample.

When I call the crawl_one function on its own, it appears to find all of the files. So somewhere between this and the output, the files are getting lost.

Zhack47 commented 2 months ago

Hello, I have had a similar problem on a dataet, wher no file was found. Uponfurther inspection, it seems, that the condition for recursive search with glob (in src/imgtools/utils/crawl.py, l.17) is too strict. Inded, it only looks for files ending in ".dcm", which is not always the case for DICOM files :)

I simply changed the condition to "*", to include all files. This allowed the tool to find my patients and is actually what is present in the article's branch F1000Research

Hope this helps !

Zhack47 commented 2 months ago

Overall, this strict matching of only "*.dcm" is a problem in multiple places in the code, for example further down the line I had the same issue with RT Structure Set files conversion

Sometimes thee files will end in .dcm, other times .DCM, other times no suffix at all !

I think it would be necessary to check the files are DICOM another way, to make this tool agnostic to the filename suffix :)