ClimateGlobalChange / tempestextremes

Extreme weather detection and characterization
84 stars 30 forks source link

Speeding up min/max search in noisy data #27

Closed zarzycki closed 3 years ago

zarzycki commented 3 years ago

For noisy data w/ small differences in the "base" field, many minima/maxima are found given the nearest neighbor local lookup. One way to speed this up without modifying the code is to apply an epsilon value to break the local search if the min/max is "within the noise." This could be passed in via command line (just default to 0 for current functionality).

Example modification below. Just hardcoded a sample eps in to verify speed increase. Here, the main contour threshold is 200., so this is 0.5% of that (no change to tracked features) but speeds up the code from 1377s to 11s on Cheyenne.

template <typename real>
void FindAllLocalMaxima(
        const SimpleGrid & grid,
        const DataArray1D<real> & data,
        std::set<int> & setMaxima
) {
        int sFaces = grid.m_vecConnectivity.size();
        real eps = 1.0;
        for (int f = 0; f < sFaces; f++) {

                bool fMaximum = true;

                real dValue = data[f];
                int sNeighbors = grid.m_vecConnectivity[f].size();
                for (int n = 0; n < sNeighbors; n++) {
                        if (data[grid.m_vecConnectivity[f][n]] > dValue - eps ) {
                                fMaximum = false;
                                break;
                        }
                }

                if (fMaximum) {
                        setMaxima.insert(f);
                }
        }
}
zarzycki commented 3 years ago

Confirming that commit: bfd118fe657c2e70f100009db7069807ed4b7001 addresses this problem.

--searchbythreshold >0 excludes any of the base field from being considered extrema and therefore speeds up the algorithm by ~100x as it does not have to check and exclude each of the "0" maxima with the closed contour criteria.