OHDSI / ImageWG

Repository for medical image working group
https://ohdsi.github.io/ImageWG/
Apache License 2.0
7 stars 4 forks source link

ETL of DICOM Header: Raw vs Cleansed Data #11

Open kyulee-jeon opened 2 months ago

kyulee-jeon commented 2 months ago

When ETL-ing DICOM headers, should we use raw data or cleanse it before uploading?

Considerations:

  1. Modality Concept ID: Across similar medical images, the 'Modality' tag (0008,0060) exhibits diverse representations.
Modality Captured Values for 'Modality (0008,0060)' Tag
Mammography MG (Mammography), CR (Computed Radiography)
Chest X-ray CR (Computed Radiography), DR (Digital Radiography), DX (Digital X-ray)

For instance, reviewing sample data from Korea, we observed a mixture of MG and CR in Mammography studies. Similarly, Chest X-ray studies showed a mix of CR, DR, and DX.

  1. Errors in Raw Data: Instances where 'Body Part Examined' in Chest X-ray headers was reported as 'Skull'.