Used a numerically stable implementation to compute the softmax of a numpy array along a given dimension, which reduces the chances of overflows or underflows and improves performance for large arrays.
Used numpy's unique() and cumsum() functions to create the lab_to_ind lookup table, which is more efficient than looping over each label, especially for large arrays.
Used numpy's in1d() function to create a boolean mask of the unique labels, and then used numpy's broadcasting to assign the corresponding RGB values to the output array, which is more efficient than looping over each label, especially for large arrays.