ProcessMiner / nlcor

An implementation of an efficient heuristic to compute the nonlinear correlations between numeric vectors. The heuristic works by adaptively identifying multiple local regions of linear correlations to estimate the overall nonlinear correlation. The nonlinear correlations estimate has various applications in data exploration and variable selection for nonlinear models.
GNU General Public License v2.0
18 stars 1 forks source link

nonlinear correlation calculation #15

Open zhuyingqin opened 3 years ago

zhuyingqin commented 3 years ago

Netcor calculates the data segment to find the cor, and the absolute value is added and then averaged,Is this understanding correct? Take the maximum value of cor.estimate returned by Netcor for different subdivisions,Why is it a non-linear correlation? What is the principle of your nonlinear correlation calculation?

cran2367 commented 3 years ago

@zhuyingqin The underlying principle is to perform segmentation, and compute the segment-wise linear correlations to estimate an overall nonlinear correlation. The v2.0 pushed recently performs the segmentation using a dynamic programming for a more computationally efficient and theoretically robust approach.

zhuyingqin commented 3 years ago

What is the calculation principle of nonlinear correlation?

zhuyingqin commented 3 years ago

How to use piecewise linear correlation to estimate the overall nonlinear correlation?

cran2367 commented 3 years ago

We will be publishing the nlcor paper in a month describing the approach in detail.

zhuyingqin commented 3 years ago

Please notify me after the paper is published

brshallo commented 3 years ago

In the case of computing a "total adjusted" correlation, Fisher's transformation may be useful.

To combine the p-values you may want to use either fisher's method, stouffer's method, or a related approach, see: https://en.wikipedia.org/wiki/Fisher%27s_method . Your current approach overly penalizes the significance when taking more splits. For example, three p-values of 0.05 should have a "total adjusted p-value" less than 0.05, but your current approach (1 - prod(1 - p)) does the opposite. (Any adjustment penalties you might add should be based on number of segments or tests, k, rather than the p-values themselves.)

cran2367 commented 3 years ago

@brshallo thank you for the note! We will look into this.

brshallo commented 3 years ago

@cran2367 you are welcome. I stumbled onto your package while toying with brshallo/piececor that has a lot of overlap in thinking. The methods I mention above are what I used there.