Loading JPG files sequence with pre-cached JSON meta data

gyopiazza commented 8 years ago

Hello! When loading DICOM sequences it can become very slow sometimes due to the huge weight of the DCM images and the processing time that it takes to extract the meta data from them.

So I was thinking about a backend tool that takes DCM sequences, extracts the meta data and converts them to JPG so that Cornerstone doesn't have to do it on-fly (dramatically improving the performance with large sequences).

Is it possible to pass in a sequence of JPGs with a JSON file (or similar) containing the pre-cached meta data?

Thank you and keep up the good work.

chafey commented 8 years ago

Yes, you would use the cornerstoneWebImageLoader to display the JPEGs.

gyopiazza commented 8 years ago

@chafey thank you and what about the meta data information? Can it be extracted with the dicomParser? Is there an example of how to load the extracted meta data along with the JPGs?

chafey commented 8 years ago

Yes you can use dicomParser for this. There is no example like you are requesting

RowanReid commented 8 years ago

@gyopiazza If you make any progress with this please let me know, I'd be happy to contribute to any work you start as this would greatly help my implementation of Cornerstone as well.

fedorov commented 8 years ago

Do you plan to give the user some indication that the images were lossy compressed when JPG is displayed in place of the original data?

gyopiazza commented 8 years ago

@chafey it's good to know that it can be done :)

The dicomParser explicitDataSetToJson example shows how to load the DataSet for a single DCM file. In a sequence of images, should I place each DCM data in a JSON array before feeding it to Cornerstone?

For example aaa coming from slice 1 and bbb coming from slice 2 of a DCM sequence:

[ {"x00080005": "aaa"}, {"x00080005": "bbb"} ... ]

Using the cornerstoneWebImageLoader I'm able to load an array of images via cornerstone.loadImage but which is the method to import the JSON DataSet previously extracted with the dicomParser?

Thanks for the help!

@RowanReid I'll try to publish the code if I manage how to do these operations.

@fedorov You can detect the file type by checking the file extension when users drop the files, read the MimeType on-fly with Javascript or also server-side if there is an upload in your application.

fedorov commented 8 years ago

@fedorov You can detect the file type by checking the file extension when users drop the files, read the MimeType on-fly with Javascript or also server-side if there is an upload in your application.

@gyopiazza I was asking from the user perspective, i.e., does it make sense to add a label (perhaps as an option) in the overlay to warn user about lossy compression. Seem to be a generic functionality that will be needed by all clinically-oriented viewers.

gyopiazza commented 8 years ago

@fedorov Probably that depends a lot on the specific use of this library, as each application might implement the information in a different way.

gyopiazza commented 8 years ago

@chafey Do you have any news on this? Or could you point me to the relevant code/docs parts?

Thank you

chafey commented 8 years ago

If you switch to using JPEG's, several tools will not work (ww/wc, probe, ROI) because those require access to the raw pixel data (JPEG is just 8 bits). There are a few approaches to this: 1) Disable those tools. 2) 100% Server Side Rendering. The tools will need to be ported to the server side and implemented there. WW/WC changes require requesting an updated image 3) Hybid - Send over JPEG initially and then "upgrade" to full pixel data only when needed (e.g. when use starts adjusting WW/WC, etc).

Which of these 3 are you interested in? I think #3 might be interesting to a large number of people but requires server side code to work. We could implement this in the OHIF viewer (which uses cornerstone and has a server side we can extend) or we could even try just using the standard dicomWeb APIs directly.

Thoughts?

gyopiazza commented 8 years ago

@chafey thanks again for your input.

Option #3 seems the most interesting, however it will still suffer of performance/ux issues, for example: when a user wants to modify WW/WC, the actual DCM (current slice or all the slices) will have to be downloaded/parsed before any interaction can be made. While better than options #1 and #2, it's still not optimal.

I'm not an expert of the DICOM format so please forgive me: is it really impossible to apply WW/WC directly on the JPGs, even if it means loosing a certain degree of precision?

I assumed that the WW/WC were "simply" brightness/sharpness settings, not something that relied on the RAW pixel data.

After many extensive tests with large DCM sequences (about 90mb) I realised the performance (between downloading and parsing) is just not acceptable. So the logic conclusion was to "pre-parse" the DCM images as JPGs + JSON-DataSet, assuming that the DataSet would give enough information to apply WW/WC, measuring etc.

chafey commented 8 years ago

Your assumption is incorrect - WW/WC requires the underlying raw pixel data and so do some of the other tools (ROI, pixel probe, etc). Most medical images have more than 8 bits of data per pixel so converting to a lower bit depth (8 bit JPEG) requires discarding information that may be clinically important.

Medical images can be quite large and there are a variety of ways to approach the performance issue - how you solve yours really depends on your specific requirements.

RowanReid commented 8 years ago

I would imagine option 1 would be a good first step. This would allow users to benefit from a quick and responsive experience when browsing through radiology (a 'lite' version of the viewer if you will). And switch to the slower 'full' experience when the those tools are required

This makes sense at least in the context of how I have implemented Cornerstone.

gyopiazza commented 8 years ago

@chafey

Is the pixel data that would be discarded by using 8bit JPGs really always critical? As a note, in my app I don't need a 100% clinical high-fidelity.

Would the PNG format (24bit lossless) address the depth issue?

About the embedded DataSet I might be wrong but, isn't it redundant for a sequence of DCM files, since the same data is repeated for each slice? For my app, where performance is really important, I'd pre-parse it server-side and load it together with JPGs/PNGs instead of parsing it on-fly in the browser. Of course, only if the WW/WC (and maybe the measuring tool) can be used without truly dramatic differences with the original DCM.

@RowanReid that is something I considered, but in my case I always need the WW/WC tools.

chafey commented 8 years ago

In my experience, users will tolerate 1-2 bits of loss but any more than that and you start hiding important details. There are many variables though - modalities have different bit depths, procedures have different sensitivities to low frequency data, etc. You must of course validate your solution with users any time you use lossy compression (however you choose to do it)

PNG does have a 16 big gray encoding but none of the browsers are able to return the 16 bit data to javascript (they either don't decode or they return the upper 8 bits). It is possible to use PNG by packing 16 bit pixels into the 8 bit RGBA channels and unpack client side but I don't think you get any better compression results that if you just gzip the 16 bit data to begin with (PNG uses GZIP for compression so you are adding overhead with pack/unpack to use it)

I recommend doing more performance analysis to make sure you have the real issues identified and measured. I have never seen the DICOM parsing be a bottleneck (it is quite fast), the time to decompress pixel data is much greater (especially when using JPEG2000). Bandwidth constrained scenarios often result in performance issues with large size images (CR, MG, DX) and multiframe just due to the size of the data you are moving around.

There is a lot of redundancy in the metadata but there are several instance specific unique values that the tools depend on (e.g. image position, row/column cosines, etc). Removing redundancy will reduce the size of the metadata significantly (often 20x smaller or more)

chafey commented 8 years ago

@gyopiazza I just realized that I am talking about issues related to medical imaging in general and your needs may be a subset of that. Can you provide a quick explanation of what you are trying to do? For example, if your only concern is displaying ultrasound images and don't care about other modalities, the original solution you suggested would be appropriate.

gyopiazza commented 8 years ago

@chafey first of all thanks for taking the time to analyse the issue and come up with solution proposals, really appreciated! :)

My project consists of 10/20 sequences of DCM images, about 30/80mb and 20/200 slices per sequence. While I don't need 100% clinical accuracy, the tool will be used by radiologists to display Brain, Chest and other radiographies. It should be at least realistic in the WW/WC and measuring and – although it doesn't have to be ultra precise – it needs to be very fast (time is a factor in my app).

I already have an online DCM viewer done with another open source library that works fine, except the performance which could be dramatically improved by using JPGs and pre-parsed DataSet.

I was considering the switch to Cornerstone due to its modularity and the dicomParser which, in my first idea, could have been used to pre-parse the DataSet along with the JPGs to experience a big performance boost.

DCM files size The first problem is the DCM files size, which takes inevitably too long to load in my scenario and could be reduced to 1/5th of the size (a sequence of 80mb would become 16mb). Assuming I have at least 10 sequences of 80mb that's already 800mb to transfer, which could easily be reduced to 160mb by using JPGs.

Browser caching Another problem is leveraging browser caching. The server is setup to enable caching and it works up to roughly 1gb, but sometimes files are downloaded again when they exceed the limit.

DataSet Processing The processing time to extract the DataSet is what takes the least (between 2-10 seconds), but it still adds up to the total time due to the big amount of DCM files.

One way to address these issues could be to keep the DCM viewer instances "in memory" instead of relaunching them, but it would quickly saturate the available memory or at least it would make the experience sluggish. Plus it won't fix the "first load" wait.

Thank you for the feedback!

gyopiazza commented 8 years ago

I just felt the need to put some emphasis on the fact that the library/tool used to parse DCM images is not the bottleneck, as is the data transfer and rendering for big files (as you also mentioned).

chafey commented 8 years ago

DCM File Size - Images should always be compressed in some manner. gzip will give you 2-3:1 lossless depending on the image. If that isn't enough, you will have to go to lossy compress or server side rendering. JPEG 12 bit lossy is worth looking at.

Preprocessing the metadata for a study into a single document is a very wise strategy.

gyopiazza commented 8 years ago

@chafey I omitted to say that the GZIP compression is already enabled on the server, probably the best is to set up a test page using JPG/PNG and DCM to compare the WW/WC filters in an actual environment.

About preprocessing the metadata, could you please point tell me:

– In the dicomParser example I can see how to extract the DataSet for one DCM file, but how can I combine the output of multiple DCM files in a sequence?

– When loading the images into Cornerstone, how can I pass the extracted DataSet? I can't find a relevant example.

As always, thank you for the help!

chafey commented 8 years ago

Why not put the output of each instance in an array?

Cornerstone doesn't directly depend on dicomParser or the dataSets it produces, it depends on the Image objects produced by image loaders and metadata providers. CornerstoneWADOImageLoader uses dicomParser to produce image objects for cornerstone. Your job will be to create a custom image loader which takes as input your metadata, produces imageId's and then returns Image Objects based on those imageId's and your metadata. You will also need to create your own metadata providers.

gyopiazza commented 8 years ago

Is the cornerstoneWebImageLoader example loading the metadata or it just displays the JPG?

Would it be an option to load the images without the metadata, or something vital might get lost? I probably don't need the embedded WW/WC values nor the patient data.

chafey commented 8 years ago

cornerstoneWebImageLoader does not load metadata, it just displays the JPEG and uses hard coded values for pixel spacing (which are required by some tools like measurement). There might be a way to pass the pixel spacing in the header of the image you return so you can avoid the metadata all together.

chafey commented 8 years ago

Where are we at with this issue? Is there anything I can do to help?

gyopiazza commented 8 years ago

Hey @chafey thanks for the interest!

I managed to do almost all I needed except the pixel spacing. Would it be possible for you to add a method in Cornerstone to set the pixel spacing via JS instead of embedding the information in the images header?

Something like cornersone.setPixelSpacing(20); or similar.

Even better, if the metadata could be set manually like cornerstone.setMeta('pixel-spacing', 20);. This way we might use it to set any other metadata information.

About the pixel depth, I noticed what you said about JPG/PNG because, for some radiographies, the quality was not acceptable. I'm not sure if the lack of metadata (I don't extract it from the dcm) could affect the result.

chafey commented 8 years ago

Actually, what we need to do is switch cornerstoneTools over to use a metadataprovider to get the pixel spacing attributes. @swederik can you add this to your list of todo items? I will deprecate pixel spacing in cornerstone core and replace it with aspect ratio to deal with non square pixels. In the meantime, you should be fine modifying the pixel spacing attributes in the image object itself (this is just a temporary workaround, I consider the image object immutable in general)

gyopiazza commented 8 years ago

How do I set the pixel spacing attribute on the image obejct? I believe I have to use the image object passed by loadAndCacheImage(), but what's the attribute?

chafey commented 8 years ago

The Image object is documented in the wiki:

https://github.com/chafey/cornerstone/wiki/image

gyopiazza commented 8 years ago

I checked some DCM metadata and I can see various pixel-related: ImagerPixelSpacing, SamplesPerPixel, PixelSpacing, PixelRepresentation, SmallestImagePixelValue, LargestImagePixelValue.

I assume that SmallestImagePixelValue and LargestImagePixelValue relate to cornerstone's minPixelValue and maxPixelValue. What about columnPixelSpacing, rowPixelSpacing?

chafey commented 8 years ago

Here is how cornerstoneWADOImageLoader does it:

https://github.com/chafey/cornerstoneWADOImageLoader/blob/master/src/getPixelSpacing.js

jimweldy commented 8 years ago

a couple of quick comments on this thread, although a bit stale, since safety may be concerned:

I have found lossy jpgs to be sufficient in several scenarios; you really need to do quantifiable studies to make the decision though...
I have used ImageMagick's "convert" tool very successfully for this purpose (haven't done this for several years, so better tools may exist now)
Please indicate on the image that lossy compression was used. e.g., if you do use the "convert" tool, you can use the "draw" option to burn text into the pixels. It is important to burn this into the pixels, as any overlay mechanism may be ignored by who-knows-what software may be used to view them in the future. Pay attention to ratio of font size to image height so that the text is readable :-)
Pay attention to the PHOTOMETRIC INTERPRETATION tag in DICOM, as this will help you decide which files to invert during conversion.
I haven't been using JPG's in my cornerstone project, but if I did, I would look for a way to replicate the WW/WL functionality for the lower bit-depth images.

Thanks for putting cornerstone together!

chafey commented 8 years ago

Closing this issue

cornerstonejs / cornerstone

Loading JPG files sequence with pre-cached JSON meta data #35