Bulk registration of PACSFiles via job

jennydaman commented 8 months ago

DICOM instances are registered to CUBE one by one using HTTP POST requests. Typically, we have multiple instances per series. E.g. a MR series can have 40-400 instances, meaning we might need to make 400 HTTP post requests to CUBE in quick succession. The WSGI server's performance can be a bottleneck when multiple series are being registered simultaneously.

I ran a benchmark 3 times. During the benchmark, I registered 26GB of DICOM files to CUBE with a maximum of 16 worker threads. The CPU limit of CUBE was set to 4, 10, and 12 for the 3 benchmarks respectively. Here we can see the CPU usage of CUBE:

Increasing the CPU limit decreases the amount of time it takes for the benchmark to finish. However, in all situations, CPU throttling of CUBE happened due to the CPU limit.

Proposal: CUBE should be able to scan its own storage for PACSFiles, read the DICOM tags, and register them itself. It should register PACSFiles when it is informed of the presence of a new directory of files from RabbitMQ. (The DICOM file receiver component, oxidicom, would be responsible for sending this event to RabbitMQ.)

jennydaman commented 8 months ago

The situation where this issue affects UX is when a user attempts to pull multiple series, e.g. if a user wants to pull a study. The slowdown will be short-lived so this issue is not super important. It takes 1-3 minutes for a series of 200 instances to be registered.

jennydaman commented 5 months ago

Will be solved by https://github.com/FNNDSC/oxidicom/pull/1

jennydaman commented 2 months ago

FNNDSC / ChRIS_ultron_backEnd

Bulk registration of PACSFiles via job #546