Format dicom - Githubissues

tcpan commented 6 months ago

Updated read and write. Better standard compliance for write. Reading now allows random data access.

tompollard commented 6 months ago

Thanks @tcpan. I think we should go ahead and merge, but we will have to do some restructuring afterwards because our current framework expects each module in the formats folder to represent a separate format.

@bemoody @briangow my suggestion would be either:

allowing formats to include utilities in a subfolder within formats. e.g.

waveform_benchmark/formats/dicom.py -> leave in current location

waveform_benchmark/formats/dcm_rand_access.py -> waveform_benchmark/formats/dicom/rand_access.py

waveform_benchmark/formats/dcm_waveform_reader.py -> waveform_benchmark/formats/dicom/waveform_reader.py

waveform_benchmark/formats/dcm_waveform_writer.py -> waveform_benchmark/formats/dicom/waveform_writer.py
Store rand_access.py, waveform_reader.py, waveform_writer.py in a separate repository, which is then added as a dependency here. This might make sense if we consider the modules to be implementing the format, rather than just applying it.

tcpan commented 6 months ago

Hi, Brian, sorry for the delay. Was working on house stuff all day. Will do re subfolder. Is formats the right place to put that?

Get Outlook for iOShttps://aka.ms/o0ukef

From: Brian Gow @.> Sent: Saturday, May 4, 2024 6:40:15 PM To: chorus-ai/chorus_waveform @.> Cc: Tony Pan @.>; Mention @.> Subject: Re: [chorus-ai/chorus_waveform] Format dicom (PR #52)

@briangow commented on this pull request.

@tcpanhttps://github.com/tcpan , thanks so much for this - preparing the DICOM format for benchmarking clearly took a lot of work!

I've left a couple of comments in the code. One other thought is that it might be cleaner to move the dcm_ files under a separate folder (perhaps formats/dcmutils/dcm) such that the only files at the formats/ level are the ones that get called for benchmarking. This is completely up to you though, no worries if you prefer to leave them where they are.

In waveform_benchmark/formats/dicom.pyhttps://github.com/chorus-ai/chorus_waveform/pull/52#discussion_r1590173414:

dicom.save_as("/mnt/c/Users/tcp19/Downloads/Compressed/test_waveform.dcm", write_like_original=False)

def write_waveforms(self, path, waveforms):

fs = FileSet()

one series per modality

as many multiplexed groups as allowed by modality

one instance per chunk

one dicomdir per study? or series?

studyInstanceUID = uid.generate_uid()

seriesInstanceUID = uid.generate_uid()

prefix, ext = os.path.splitext(path)

import json

======== organize the waveform chunks (tested)

print("INPUT pleth", waveforms['Pleth'])

Perhaps this line should get commented out.

In waveform_benchmark/formats/dicom.pyhttps://github.com/chorus-ai/chorus_waveform/pull/52#discussion_r1590173600:

+# name: modality name, maximum sequences, grouping, SOPUID, max samples, sampling frequency, source, datatype, +# TwelveLeadECGWaveform: ECG, 1-5, {1: I,II,III; 2: aVR, aVL, aVF; 3: V1, V2, V3; 4: V4, V5, V6; 5: II}, 16384, 200-1000, DCID 3001 “ECG Lead”, SS +# GeneralECGWaveform: ECG, 1-4, 1-24 per serquence, ?, 200-1000, DCID 3001 “ECG Lead”, SS +# General32BitECGWaveform: ECG, 1-4, 1-24 per serquence, ?, by confirmance statement, DCID 3001 “ECG Lead”, SL +# AmbulatoryECGWaveform: ECG, 1, 1-12, maxsize of wvaeform data attribute, 50-1000, DCID 3001 “ECG Lead”, SB/SS +# HemodynamicWaveform: HD, 1-4, 1-8, maxsize of wvaeform data attribute, <400, , SS +# CardiacElectrophysiologyWaveform: EPS, 1-4, , <=20000, DCID 3011 “Electrophysiology Anatomic Location” , SS +# ArterialPulseWaveform: HD, 1, 1, ? , <600, DCID 3004 “Arterial Pulse Waveform” , SB/SS +# RespiratoryWaveform: RESP, 1, 1, ? , <100, DCID 3005 “Respiration Waveform”, SB/SS +# ScalpEEGWaveform: EEG, 1 (interruption as separate instances), 1-64, , unconstrained, DCID 3030 “EEG Lead”, SS/SL +# ElectromyogramWaveform: EMG, unconstrained, 1-64, , unconstrained, DCID 3031 “Lead Location Near or in Muscle” or DCID 3032 “Lead Location Near Peripheral Nerve”, SS/SL +# SleepEEGWaveform: EEG, unconstrained, 1-64, , unconstrained, DCID 3030 “EEG Lead” , SS/SL +# MultichannelRespiratoryWaveform: RESP, >1, , unconstrained, DCID 3005 “Respiration Waveform” , SS/SL +# + +CHANNEL_TO_DICOM_IOD = {

Do we need to add the other channel types from the proposed suite of waveforms for benchmarking? See Waveform suite characterization here: #11 (comment)https://github.com/chorus-ai/chorus_waveform/discussions/11#discussioncomment-9225909 .

— Reply to this email directly, view it on GitHubhttps://github.com/chorus-ai/chorus_waveform/pull/52#pullrequestreview-2039556330, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHFAKKH54S6CYQJY6LHO4LZAVPU7AVCNFSM6AAAAABHGQEHJCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZZGU2TMMZTGA. You are receiving this because you were mentioned.Message ID: @.***>

briangow commented 6 months ago

Hi, Brian, sorry for the delay. Was working on house stuff all day. Will do re subfolder. Is formats the right place to put that?

@tcpan , it's up to you. If you feel that these files are useful beyond this repository perhaps they deserve a dedicated place as @tompollard suggested in this 2nd thought on the topic. It would probably make sense to create a general utils module in this repository at some point. If you prefer that option feel free to set it up or let me know and I'd be happy to do so. As far as I'm concerned, having them in a subfolder under formats/ is also fine.

tcpan commented 6 months ago

I have checked in some updates, with some fixes and some new tests for different chunking strategy (likely not to make a huge difference). Note that to get the channel metadata, each file is opened and scanned. A more efficient way to do this would be to add the channel metadata as a private tag on the DICOMDIR file. This would still be standards compliant, but I have not implemented this.

briangow commented 6 months ago

Thanks @tcpan !

tompollard commented 6 months ago

Hi, Brian, sorry for the delay. Was working on house stuff all day. Will do re subfolder. Is formats the right place to put that?

@tcpan , it's up to you. If you feel that these files are useful beyond this repository perhaps they deserve a dedicated place as @tompollard suggested in this 2nd thought on the topic. It would probably make sense to create a general utils module in this repository at some point. If you prefer that option feel free to set it up or let me know and I'd be happy to do so. As far as I'm concerned, having them in a subfolder under formats/ is also fine.

I think the cleanest option is to store rand_access.py, waveform_reader.py, waveform_writer.py in a separate "DICOM-waveform" repository, because (as far as I understand it) the code implements the functionality for writing waveforms to DICOM. If we choose DICOM as the format, we will need to have a package that enables these functions (and actually, I assume there will be demand by the community for this functionality regardless). I'd be happy to help set up a pip installable version of this repo if you like this option.

If we want to go for the quick and easy option, my preference would be to move them to a submodule of the formats folder. I don't particularly like the idea of a generic utils submodule, because it has potential to become a dumping ground for random code.

chorus-ai / chorus_waveform

Format dicom #52

dicom.save_as("/mnt/c/Users/tcp19/Downloads/Compressed/test_waveform.dcm", write_like_original=False)

one series per modality

as many multiplexed groups as allowed by modality

one instance per chunk

one dicomdir per study? or series?

import json

======== organize the waveform chunks (tested)