z-profile very slow in CARTA 3 and 4 on k8s and ceph

kcasteels commented 1 year ago

Description of issue

When viewing large spectral cube files (~6 GB), the z-profile renders extremely slowly, sometimes taking several minutes for one spectra to draw. CARTA is running in a pod on a kubernetes cluster backed by ceph storage. When opening the same file locally on a laptop (mac) the z-profile render is almost instantaneous.

To Reproduce

Open a large spectral cube FITS file a few GB or larger in size. Then go to: View->Layout->Existing Layouts->cube view

Then the image comes up solid blue, but by moving the animator frame position to the middle and clicking play the image appears. When playback is stopped we can move the cursor over the image and see the x, y and z profiles. x and y appear after a second, but the z profile can take many seconds to start rendering, and then it is very slow.

Expected behaviour

We expect the GUI to be fast and responsive and for the z-profile to render very quickly.

Platform info:

OS: Ubuntu 20.04.6 LTS running as a pod on a kubernetes cluster with a ceph storage backend.
Browser: Chrome, Safari and Firefox
Version: tested on several browser types and by different people on different computers.

Additional context

We suspected the file ceph system of being too slow, but the read/write speeds were only about 3x slower than an ssd on a laptop. The z-profile render is 100s of times slower.

We have tried providing CARTA with different amounts of resources from 2 to 8 cores and 16-64 GB RAM with no change in performance.

We tried explicitly setting the omp_threads (OpenMP) to match the requested cores. Startup param: --omp_threads= but that didn’t change the slow rendering at all.

The session is run with the --no-browser flag set.

veggiesaurus commented 1 year ago

How are you measuring the read/write speeds? Is the ceph system hard-drive based? The z-profile involves a large number of very small reads (one per channel). This is the sort of thing that SSDs are very good at, and things that distributed systems are very bad at.

One solution is to use the fits2idia package (https://github.com/CARTAvis/fits2idia) to convert the files from FITS to an HDF5 file with a particular schema. This includes a copy of the dataset in a rotated frame, making z-profile reads a single sequential read and thus very efficient on distributed systems. You can read more about the format here

kcasteels commented 1 year ago

We measured the read/write speeds using the dd command. These measurements are for sequential reads, so seek time isn't being taken into account. Ceph uses spinning disks.

We will test using the HDF5 format instead of FITS. By the sounds of it using HDF5 will solve the z-profile issue. We would still like to support FITS files though.

Is there a way to load an entire image into RAM before processing?

Alternatively we could create a RAM disk and copy the image over to that from ceph, but this is more cumbersome than having CARTA handle it directly.

veggiesaurus commented 1 year ago

Is there a way to load an entire image into RAM before processing?

Not yet, but there is justification for adding this for small files (or a system-wide configuration setting), for things like PV images as well.

Alternatively we could create a RAM disk and copy the image over to that from ceph, but this is more cumbersome than having CARTA handle it directly.

I think for now that's a better option, or use HDF5. We can try to scope this out as an R&D item in the meantime.

CARTAvis / carta

z-profile very slow in CARTA 3 and 4 on k8s and ceph #187