M20 NavCam calibration is too slow

kmgill commented 1 year ago

Very slow. perf shows excessive strncmp which may be related...

kmgill commented 1 year ago

Issue appears to be in resizing the flat and mask to match the image scale factor:

2023-04-16 12:44:40.608 src/m20/ecam.rs:138 Resize Start
2023-04-16 12:44:50.002 src/m20/ecam.rs:143 Resize finish

kmgill commented 1 year ago

The resize operation, image::imageops::resize is by itself sufficiently performant. The problem is that resizing with sciimg is being done in a rather inefficient manner. Since each color band is stored in a separate buffer, each band then is resized individually and one after the other. This is then being done for the flat and then the mask. I can either just generate flats/masks for each scale factor and load those without resizing, or see if parallelizing the resize with rayon will help. Probably not, though, because when calibrating, images are already being farmed out to threads on each available CPU core.

kmgill commented 1 year ago

Generating flats/masks for each scale factor introduces another problem: We are going to start getting into the problem where the build deb package will be too large when using cargo deb. Do I then split the calibration files out to separate packages, or add a new subcommand to MRU that downloads/updates the calibration data on demand? Both increase complexity, but the second would potentially allow me to improve calibration data without requiring a new software release....

sschmaus commented 1 year ago

I have had similar considerations for my python calibration implementation. Should I precompute the downsized flats or are is downsizing for each image fast enough? 10 seconds seems like a long time for a simple downsize operation. Currently my whole flatfielding and decompanding routine for a 1x downsampled image takes just 1 second (single thread) and that includes resizing the flat for every image again.

How is your downsampling implemented? Is it a simple binning operation or some kind of bilinear interpolation?

kmgill commented 1 year ago

Yeah, each downsizing is pretty quick individually even using Lanczos3, but because each color band is stored as a separate image, it has to do the downsampling 3 times per image. Super inefficient.

kmgill commented 1 year ago

Adding parallelization to sciimg::image::Image::resize() gave a small boost when processing a single file, but not enough that it's worth the increased complexity.

sschmaus commented 1 year ago

If it was python I'd take a look at implementing binning (i.e. 2x2 or 4x4 pixel averaging) as numpy array operations, those tend to be extremely fast.

Personally I'm doing something completely different, since I'm not restricted by the package size limit I'm directly loading the PDS calibration file as a uint16 array in its raw bayered format. Depending on the downsampling stage of the image I'm applying the same kind of downsampling to the flat, at scale 0 I'm debayering with VNG (no malvar implementation yet) and at scales > 0 I'm doing superpixel debayering like it's done for the images on the rover.

Here is my debayering(&downsampling) method, the image is loaded into an object as self.img

#debayering to original bayered image
    def debayer(self, method='VNG', factor=1):

        if method == 'VNG':
            debayered = []
            #debayer image again with VNG debayering which produces less artifacts
            debayered = cv.cvtColor(np.roll(self.img, 2, axis=0), cv.COLOR_BayerRGGB2BGR)  #roll array down 2px because VNG debayering is bugged https://github.com/opencv/opencv/issues/5089
        elif method == 'Bilinear':
            debayered = []
            debayered = cv.cvtColor(self.img, cv.COLOR_BayerRGGB2BGR)
        elif method == 'Downsample':
            if factor >= 1:
                #make new array with half the size of original one
                debayered = np.zeros([int(self.img.shape[0]/2),int(self.img.shape[1]/2),3], self.img.dtype)
                debayered[:,:,2] = self.img[0::2, 0::2]
                debayered[:,:,1] = np.mean(np.array([self.img[1::2, 0::2],self.img[0::2, 1::2]]), axis=0)
                debayered[:,:,0] = self.img[1::2, 1::2]
            if factor == 2:
                #make new array with quarter the size of original one
                debayered = np.zeros([int(self.img.shape[0]/4),int(self.img.shape[1]/4),3], self.img.dtype)
                debayered[:,:,2] =  np.mean(np.array([self.img[0::4, 0::4], self.img[0::4, 2::4], self.img[2::4, 0::4], self.img[2::4, 2::4]]), axis=0)
                debayered[:,:,1] = np.mean(np.array([self.img[1::4, 0::4],self.img[0::4, 1::4],self.img[2::4, 3::4],self.img[3::4, 2::4],self.img[2::4, 2::4],self.img[2::4, 2::4],self.img[2::4, 3::4],self.img[3::4, 2::4],]), axis=0)
                debayered[:,:,0] = np.mean(np.array([self.img[1::4, 1::4], self.img[1::4, 3::4], self.img[3::4, 1::4], self.img[3::4, 3::4]]), axis=0)
                pass

        self.img = debayered

the downsampling debayering runs though in a fraction of a second and I imagine a binning operation implemented in the same manner would also be very quick.

kmgill commented 1 year ago

Ok, the last commit split the flats out to separate files per scale level. Calibration is now a lot faster. A further optimization would be to cache the flats in memory for use by all theads (load once), rather than each calibration process loading the file individually.

kmgill commented 1 year ago

Added memcache to cache in memory those calibration files that are loaded repeatedly by each thread. It is nominally thread-safe but not yet super robust. Supports sciimg::image::Image, sciimg::imagebuffer::ImageBuffer, and String (as loaded from ASCII text files). For batch operations, the performance boost due to the reduction in repetitive disk reads is immediate. The cache can be abused by loading too much into it, but, um, don't do that.

MarsRaw / mars-raw-utils

M20 NavCam calibration is too slow #32