CSchoel / nolds

Nonlinear measures for dynamical systems (based on one-dimensional time series)
MIT License
262 stars 58 forks source link

Why the result of `nolds.hurst_rs` differ from other package ? #39

Closed eromoe closed 9 months ago

eromoe commented 11 months ago

Hi,

I checked three implement include of nodls, the result are quite differents. My target is labeling the mean-reversion and trending ranges, I have no clue for the output of nolds.hurst_rs (all > 0.9).

testing code


np.random.seed(42)
random_changes = 1. + np.random.randn(99999) / 1000.
series = np.cumprod(random_changes)  # create a random walk from random changes

# 1
from hurst import compute_Hc, random_walk
H, c, data = compute_Hc(series, kind='price', simplified=True)
print(H)

# 2
def genhurst(S,q=2):
    '''
    from https://github.com/PTRRupprecht/GenHurst/blob/master/genhurst.py
    '''

    L=len(S)

    # if L < 100:
        # warnings.warn('Data series very short!')

    H = np.zeros((len(range(5,20)),1))
    k = 0

    for Tmax in range(5,20):

        x = np.arange(1,Tmax+1,1)
        mcord = np.zeros((Tmax,1))

        for tt in range(1,Tmax+1):
            dV = S[np.arange(tt,L,tt)] - S[np.arange(tt,L,tt)-tt] 
            VV = S[np.arange(tt,L+tt,tt)-tt]
            N = len(dV) + 1
            X = np.arange(1,N+1,dtype=np.float64)
            Y = VV
            mx = np.sum(X)/N
            SSxx = np.sum(X**2) - N*mx**2
            my = np.sum(Y)/N
            SSxy = np.sum( np.multiply(X,Y))  - N*mx*my
            cc1 = SSxy/SSxx
            cc2 = my - cc1*mx
            ddVd = dV - cc1
            VVVd = VV - np.multiply(cc1,np.arange(1,N+1,dtype=np.float64)) - cc2
            mcord[tt-1] = np.mean( np.abs(ddVd)**q )/np.mean( np.abs(VVVd)**q )

        mx = np.mean(np.log10(x))
        SSxx = np.sum( np.log10(x)**2) - Tmax*mx**2
        my = np.mean(np.log10(mcord))
        SSxy = np.sum( np.multiply(np.log10(x),np.transpose(np.log10(mcord)))) - Tmax*mx*my
        H[k] = SSxy/SSxx
        k = k + 1

    mH = np.mean(H)/q

    return mH

print(genhurst(series, 2))

# 3 
import nolds
print(nolds.hurst_rs(series))

image

I also tested with rolling window 200 , second one seems most correctly ( contain values below 0.5 )

def hurst1(arr):
    return genhurst(arr)

def hurst2(arr):
    return compute_Hc(arr, kind='price', simplified=True)[0]

def hurst3(arr):
    return nolds.hurst_rs(arr, fit='poly')

image

image

image

eromoe commented 11 months ago

Is there any simple method to evaluate the hurst exponent ? so that I can filter out the best package.

CSchoel commented 10 months ago

Hi @eromoe. :wave: Unfortunately, there is a lot of confusing terminology around the Hurst exponent. It also took me a while to remember and dig up all that information after reading your post, but here are the reasons why the two implementations you found are different from nolds.

genhurst

This is actually a different algorithm. It computes the generalized hurst exponent according to an Algorithm by DiMatteo et al. (2003). This algorithm is implemented as mfhurst_dm in the current (unpublished) version of nolds (which you can install directly from the repository with pip install git+https://github.com/CSchoel/nolds.git).

However, for several reasons, I don't recommend using it and using mfhurst_b instead. This version is the mathematical foundation of mfhurst_dm by Barabási and Vicsek (1991). One of the reasons is that Di Matteo et al. start with a linear detrending by default. For stock data this is sensible, but for other data it might not be desirable and be better left as a separate pre-processing step. In your example, this also explains why you get a value of 0.5 instead of the prescribed value of 1.0 for a random walk, which has a strong positive correlation between elements. Basically, the detrending turns the random walk back into a series of random data points.

compute_Hc

This version indeed seems to be implementing Hurst's original rescaled range approach. However, you have to be careful with the parameters here. Looking into the code, I found the parameter kind for the "kind" of data that you have and the helper function __get_simplified_RS(). The default is set to random_walk, which again leads to the removal of a linear trend, effectively reverting the random walk back to just random data points with neither a positive nor a negative correlation. Hence the 0.5 as output.

nolds

Nolds doesn't make any assumptions on the kind of data that you have. It just implements the rescaled range algorithm as described by H. E. Hurst with the Anis-Lloyd-Peters correction factor.

I compared it to several reference implementations and found a good and in some cases perfect agreement between those implementations and my code:

Additionally, I reproduced an example published by Ian L. Kaplan for a series with a Hurst exponent of 0.72. You can find this in nolds in the unit test test_hurst_pracma in nolds.test_measures.

eromoe commented 10 months ago

Thanks for your detailed explanation. But doesn't 0.5 indicate a series may be random walk? 0.5 ~1 means there is a trend . That's why I was confused of nolds.hurst_rs return value nearby 1.

And for compute_Hc, it is the no simplified RS would remove linear trend : https://github.com/Mottl/hurst/blob/6ca8f5fb5a8dacfbec4c2df9485d267f3cef25a6/hurst/__init__.py#L62

CSchoel commented 9 months ago

You're welcome. :smile:

Expected value for Hurst exponent

For the Hurst exponent obtained with the rescaled range approach, 0.5 is the expected value for drawing independent samples from a random distribution. Hurst puts it like this in his Nature letter from 1957:

If, however, the quantities considered are entirely independent events, such, for example, as would arise from tossing a set of, say, twelve coins and recording the differences between the number of heads and number of tails at each throw, R is represented by the following equation:

R/sigma = 1.25 sqrt(N)

In that example, each set of twelve coin tosses is entirely independent of the other ones. In a random walk, however, you take the cumulative sum of random numbers, making the values in the random walk highly correlated. In general, the next value in the series will depend more on the previous value than on the random increment that is applied at that step. Hence, a Hurst exponent close to 1 is the expected result.

compute_Hc

Both __get_RS and __get_simplified_RS call the helper function __to_inc, which does the detrending, if kind is "random_walk". You might get the pure Hurst exponent without any assumptions about the data type if you set kind to "change", but I'm not fully sure about that. There is a lot of preprocessing going on there.

eromoe commented 9 months ago

Oh, I understand. I thought diff preprocess is nessisary for caculation of hurst exponent, acutually that only imply trainning data is acculmulated. Thank you very much :)