Is Volume Viewport CPU memory redundant?

Ouwen commented 1 year ago

Looking at the code, it looks like a 3D texture is created by the following chain of vtk calls:

https://github.com/Kitware/vtk-js/blob/00a46c083f4a637eb9433ea06bfc3b63ba262189/Sources/Rendering/OpenGL/Texture/index.js#L1408-L1415

https://github.com/Kitware/vtk-js/blob/00a46c083f4a637eb9433ea06bfc3b63ba262189/Sources/Rendering/OpenGL/Texture/index.js#L1301-L1338

In cornerstone3D, the streaming viewport will use texSubImage3D to send frame offsets https://github.com/cornerstonejs/cornerstone3D-beta/blob/47d0671b38e22c6509a1f02cce0489b6512e33fd/packages/core/src/RenderingEngine/vtkClasses/vtkStreamingOpenGLTexture.js#L160-L164

It doesn't seem like 3D requires one continuous memory array in order to display the image. Instead would it be possible to keep individual slices same as the stack viewport and feed in calculated offsets for the 3D case?

This would allow CPU memory to be relatively light and not duplicate data kept in the GPU memory.

Ouwen commented 1 year ago

@sedghi @JamesAPetts, would love to get your thoughts on this if you have the time.

Ouwen commented 1 year ago

Here is a quick POC on the yarn run example volumeAPI we can run the following in the console and still have most of the volume functionality:

const v = cornerstone.cache.getVolume('cornerstoneStreamingImageVolume:CT_VOLUME_ID')
v.imageData.getPointData().getScalars().delete()
delete v.scalarData

swederik commented 1 year ago

That's a very interesting idea. So you're saying that we could just keep the individual slices separate and only build the volume for WebGL when it's being used in the GPU.

I think it's not exactly redundant because with the volume loader we stream images directly into the volume buffer so the original slice should no longer be an individual entry in memory (unless we split the volume up when we hit the cache limit). Unless something has changed, we shouldn't have two copies of the same pixel data for the image and for the volume.

Not building a single volume on the CPU side would be interesting because it would make it easier to manage the cache. I guess the downside is that if you wanted to do any processing with the whole 3d volume on the CPU you would need to build the volume buffer in order to do it. It would be a pain for stuff like segmentation tools.

On Sat, Mar 18, 2023, 06:21 Ouwen Huang @.***> wrote:

[image: image] https://user-images.githubusercontent.com/5455421/226086406-5af6b1ea-b43a-4cfc-a5a6-0db066700297.png

Here is a quick POC on the yarn run example volumeAPI we can run the following in the console and still have most of the volume functionality:

const v = cornerstone.cache.getVolume('cornerstoneStreamingImageVolume:CT_VOLUME_ID') delete v cornerstone.getEnabledElements()[0].viewport.getDefaultActor().actor.getMapper().getInputData().getPointData().getScalars().delete()

— Reply to this email directly, view it on GitHub https://github.com/cornerstonejs/cornerstone3D-beta/issues/492#issuecomment-1474717612, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEUMMKAZXN3HAK6M7N3OJ3W4VA4PANCNFSM6AAAAAAV7GY4CE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JamesAPetts commented 1 year ago

Haven't worked on CS3D in a while (hope to come back sometime!) But yeah there is no reason to keep the whole volume on the CPU.

I think Erik and I actually had this discussion about a year ago. Basically 3D came first, and then stack viewport after. Hindsight is 20/20.

I think we should eventually do as you say, and have slices on CPU and 3D volumes in vram.

Not sure of timeline for this, but I agree on the long term goal.

Ouwen commented 1 year ago

Thanks for the quick responses guys!

Ouwen commented 1 year ago

One idea for volume post processing:

https://js.tensorflow.org/api/latest/?_gl=1*yode7u*_ga*NjU4NjE3MDMwLjE2NjQzOTM1OTg.*_ga_W0YLR4190T*MTY3OTE1OTIyMy4xNC4xLjE2NzkxNTkyMjkuMC4wLjA.#tensor

Seems like tfjs allows us to bind a texture in GPU and avoid CPU/GPU transfers.

sedghi commented 1 year ago

@Ouwen interesting idea. The only problem we see is the tools. Probably rendering should work, but then segmentation tools or any other tools that rely on the coordinates of each voxel need to be proxied somehow since each slice is separated. What is the original source of problem that led you to this idea?

Ouwen commented 1 year ago

@sedghi, the main problem was memory usage on mobile systems which have limits on cpu memory. It seemed like WebGL has around 1GB memory available for textures, even on mobile.

sedghi commented 1 year ago

that is too low :( I would be happy to discuss it more, but we need to have tools in mind when developing this

Ouwen commented 1 year ago

Yeah definitely interested!

that is too low :(

1GB memory on iOS, likely more on desktop of course (limited to device GPU). CPU memory seems to limit to 300MB so 1GB feels like a big upgrade ha

Initial thoughts are to use tfjs to bind the texture. Then any tools/functions which require imaging data can also be GPU accelerated + there is a nice interface to pull data from GPU => CPU if needed.

For example, I think prescaling should likely not occur on worker. If the data is on GPU the scale + add can be broadcasted in parallel.

JamesAPetts commented 1 year ago

The reason we do scaling in the worker is for cases like PET. Each frame needs to be scaled differently to make one coheesive volume. As such we found it easier to scale the whole thing in one go, or you'll have to stick a large complicated array + logic in the shader. When you only really need to scale it once.

With that in mind, if you have a good solution for that, all ears.

sedghi commented 1 year ago

seems like these two are taking too long for the create3DFromRaw

const pixData = updateArrayDataType(dataType, dataArray, is3DArray); const scaledData = scaleTextureToHighestPowerOfTwo(pixData);

and maybe we should not do them for the empty volume

sedghi commented 11 months ago

This PR kind of address this

cornerstonejs / cornerstone3D

Is Volume Viewport CPU memory redundant? #492