bjw34032 / oro.dicom

Manipulating DICOM Data in R
https://rigorousanalytics.blogspot.com
Other
1 stars 3 forks source link

duplicated rows in `dicom.dic` #7

Open corybrunson opened 2 months ago

corybrunson commented 2 months ago

Hello, and thank you for developing and sharing this package!

Some colleagues and i have encountered the following error when reading the first '.dcm' file from this collection at the Cancer Imaging Archive:

Error in if (dic$code != "SQ") { : the condition has length > 1

Some tinkering revealed that this is due to a duplicate row in dicom.dic. The code below shows that there are in fact two, the first of which is a duplicate in 'dicom.dic.csv' and the second of which is a row that appears both in this file and in 'dicom.dic.thompson.csv'. See the reproducible code chunk below (using the development version installed from GitHub).

library(oro.dicom)
#> oro.dicom 0.5.5
# number of rows
dicom.dic |> nrow()
#> [1] 4188
# number of unique rows
dicom.dic[! duplicated(dicom.dic), ] |> nrow()
#> [1] 4186
# duplicated rows
dicom.dic[duplicated(dicom.dic), ]
#>      group element code offset                               name
#> 959   0019    1456   DS      1            ReceiverFilterFrequency
#> 2745  0012    0064   SQ      1 DeidentificationMethodCodeSequence

Created on 2024-04-30 with reprex v2.1.0

I imagine that the problem would be resolved by de-duplicating the combined data set after it is constructed here. I'd be glad to for the repo and submit a PR, if indeed this seems to be the core issue and an appropriate solution.

bjw34032 commented 1 month ago

Thank you for identifying this problem and proposing a solution. Please feel free to submit a pull request and I'll be happy to take a look at it.