databricks-industry-solutions / pixels

Facilitates simple large scale processing of HLS Medical images, documents, zip files. Previously at https://github.com/dmoore247/pixels
Other
24 stars 15 forks source link

Enh: Put the DICOM header extracted metadata `meta` into a `Variant` Delta Lake data type. #62

Open dmoore247 opened 1 month ago

dmoore247 commented 1 month ago

Is your feature request related to a problem? Please describe. Speed the performance of processing the Dicome metadata header

Describe the solution you'd like Change the data type of the meta column in the object_catalog from STRING to VARIANT. Adjust readme, pre-requisites according.y

Describe alternatives you've considered Could keep it as a STRING.

Additional context The meta field is very complex JSON and may have custom fields in it (which may contain PII, PHI)

image
dmoore247 commented 2 days ago

Variant should be GA in Q4 (Nov-Jan) which could improve tag query performance by 8-9x.