SwaragThaikkandi / SMdRQA

For doing multidimensional recurrent quantification analysis(MdRQA) and sliding window version of it
https://swaragthaikkandi.github.io/SMdRQA/
GNU General Public License v3.0
1 stars 0 forks source link

Given Rossler Attractor example is misleading #208

Closed henriquepaes1 closed 3 weeks ago

henriquepaes1 commented 4 weeks ago

I tried to replicate the Rossler Attractor example. I wasn't understanding why my example generated 10 signal files and only one corresponding line in the output csv. I later found out this snippet in the windowed_RP function:

# To have file name as an identifier of the variable. Note that the
# file name should not contain "." other than the one before "npy"
a = file.strip().split('.')[0]

If you take a look at the signals generated for the problem, you find the following:

image

Due to the usage of float variables in the name generation, as stated in the snippet below:

 np.save(base_directory + '\\Rossler\\signals\\('+str(snr)+','+str(a)+','+str(u0)+','+str(v0)+','+str(w0)+','+str(rp_sizes[k2])+').npy',u)  

You can clearly infer that all the data is being saved in a single dict key in the windowed_RP execution, and the resulting csv has only one group, therefore, only one line, as the image below shows (notice the presence of a single group):

image

I'm trying to understand better the SMdRQA technique and its application. Can someone explain me what is the expected csv output for the Rossler Attractor problem when the files are named correctly?

welcome[bot] commented 4 weeks ago

Thanks for opening your first issue here! Be sure to follow the issue template!

SwaragThaikkandi commented 4 weeks ago

@henriquepaes1: This issue is currently awaiting triage.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details I am a bot created to help [SwaragThaikkandi](https://github.com/SwaragThaikkandi) manage community feedback and contributions. You can check out my [manifest file](https://github.com/SwaragThaikkandi/SMdRQA/blob/main/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
SwaragThaikkandi commented 4 weeks ago

@henriquepaes1: There are no 'kind' label on this PR. You need a 'kind' label to generate the release note automatically.

Details I am a bot created to help [SwaragThaikkandi](https://github.com/SwaragThaikkandi) manage community feedback and contributions. You can check out my [manifest file](https://github.com/SwaragThaikkandi/SMdRQA/blob/main/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
SwaragThaikkandi commented 4 weeks ago

@henriquepaes1: There are no area labels on this issue. Adding an appropriate label will greatly expedite the process for us. You can add as many area as you see fit. If you are unsure what to do you can ignore this!

Details I am a bot created to help [SwaragThaikkandi](https://github.com/SwaragThaikkandi) manage community feedback and contributions. You can check out my [manifest file](https://github.com/SwaragThaikkandi/SMdRQA/blob/main/.github/governance.yml) to understand my behavior and what I can do. If you want to use this for your project, you can check out the [BirthdayResearch/oss-governance-bot](https://github.com/BirthdayResearch/oss-governance-bot) repository.
SwaragThaikkandi commented 3 weeks ago

Hi @henriquepaes1 , I am extremely sorry for the confusion. Are you referring to the example in the documentation? I would try to explain it here. So, first let's see the functions used to simulate the Rossler attractor.

def rossler(t, X, a, b, c):
    """The Rössler equations."""
    x, y, z = X
    xp = -y - z
    yp = x + a*y
    zp = b + z*(x - c)
    return xp, yp, zp

def rossler_time_series(tmax,n, Xi, a, b, c):
    x, y, z = Xi
    X=[x]
    Y=[y]
    Z=[z]
    dt=0.0001
    for i in range(1, tmax):
      #print('Xi:', Xi)
      x, y, z = Xi
      xp, yp, zp=rossler(t, Xi, a, b, c)
      x_next=x+dt*xp
      y_next=y+dt*yp
      z_next=z+dt*zp
      X.append(x_next)
      Y.append(y_next)
      Z.append(z_next)
      Xi=(x+dt*xp,y+dt*yp,z+dt*zp)

    X=np.array(X)
    Y=np.array(Y)
    Z=np.array(Z)

    step=int(tmax/n)
    indices = np.arange(0,tmax, step)
    #print('IS NAN X:',np.sum(np.isnan(X[indices])))
    #print('IS NAN Y:',np.sum(np.isnan(Y[indices])))
    #print('IS NAN Z:',np.sum(np.isnan(Z[indices])))

    return X[indices], Y[indices], Z[indices]

This one gives the time series using the differential equation. As long as the delta t value is sufficiently small this function can approximate the system. Now, Rossler attractor has three parameters, a, b, and c. If we fix b to 0.5 and c to 5.7, above a = 2 the system will be chaotic and below a = 2 it will be periodic. So, we will set a set of values for that. And in the example, we are also artificially varying SNR by adding a Gaussian noise.

b = 0.2
c = 5.7
a_array = [0.1,0.15,0.2,0.25,0.3]
SNR = [0.125,0.25,0.5,0.75,1.0,1.25,1.5,1.75,2.0]

Now, we loop over different values of SNR, a and we repeat the process for each combination of SNR and a values for n times, with varying time series length, and initial points. So, the values of SNR and a can repeat. And for keeping everything same for combinations of SNR and a, we set the same random seed.

for snr in tqdm(SNR):                                                                             # Looping over SNR values
  for j in tqdm(range(len(a_array)):                                                              # Looping over values of parameter a
    a=a_array[j]
    random.seed(1)
    np.random.seed(seed=301)
    rp_sizes = random.sample(range(150, 251), 10)                                                 # selecting a length for time series
    for k in tqdm(range(5)):                                                                      # Repeats

          u0, v0, w0 = 0+(1*np.random.randn()), 0+(1*np.random.randn()), 10+(1*np.random.randn()) # Setting initial conditions

          for k2 in tqdm(range(len(rp_sizes))):                                                   # Looping over different RP sizes(time series length)
            tmax, n = int(1000000*(rp_sizes[k2]/250)), rp_sizes[k2]
            print('started model')
            Xi=(u0, v0, w0)
            t = np.linspace(0, tmax, n)
            x, y, z=rossler_time_series(tmax,n, Xi, a, b, c)

            x=add_noise(x, snr)                                                                  # Adding noise
            y=add_noise(y, snr)
            z=add_noise(z, snr)
            u[:,0]=x                                                                             # Defining the output matrix
            u[:,1]=y
            u[:,2]=z
            np.save('/user/swarag/Rossler/signals/('+str(snr)+','+str(a)+','+str(u0)+','+str(v0)+','+str(w0)+','+str(rp_sizes[k2])+')~.npy',u)  # Save the data to a numpy file

Then we extract features from these files

input_path = '/user/swarag/Rossler/signals'                                        # directory to which the signals are saved
RP_dir = '/user/swarag/Rossler/RP'                                                 # directory to which we want to save the RPs
RP_computer(input_path, RP_dir) 
Dict_RPs=windowed_RP(68, 'RP')                                                      # Specifying window size and folder in which RPs are saved
First_middle_last_sliding_windows_all_vars(Dict_RPs,'Rossler_data.csv')

And here, if you are talking about this csv file, this will have 4 rows for the same RP, because, the function estimates RQA features from whole RP, mean, median and mode of the sliding windows.

But, for any purposes, we need to code variables that denotes the behaviour of the system. By default all of these will be stored in a column named "group"

data = pd.read_csv('Rossler_data.csv')
FILE = np.array(data['group'])                                                      # In the output data, the field named 'group' will have file name which contains details
SNR =[]
A =[]
U0 =[]
V0 =[]
W0 =[]
SIZE =[]
for FI in FILE:
  info = ast.literal_eval(FI)
  SNR.append(info[0])
  A.append(info[1])
  U0.append(info[2])
  V0.append(info[3])
  W0.append(info[4])
  SIZE.append(info[5])

data['snr'] = SNR
data['a'] = A
data['u0'] = U0
data['v0'] = V0
data['w0'] = W0
data['length'] = SIZE
A = np.array(A)

SYNCH = 1*(A>0.2)                                                                   # Defining synchrony condition, parameter value belonging to the chaotic region
data['synch'] = SYNCH

So, it is possible to have repeating columns. Kindly reply if this is not answering your query. Because SNR, a and initiation points (u0, v0, w0) can repeat in this code, and this values can have n number of RP sizes. Further, when you extract, it will be one whole RP estimate + four windowed estimate. Depending on what feature you are interested in and which SNR, you will have to subset the data. The repeating nature is there to have similar set of condition for combination of some variable. Hope this answer would be satisfactory. Thank you so much!

henriquepaes1 commented 3 weeks ago

I appreciate your prompt reply. I'm still trying to understand the script output. For computation speed purposes, I fixed a and SNR values and generated the following signals directory:

image

Then, I called the functions to apply RQA analysis:

maker.RP_computer(input_path, RP_dir)                                                    
Dict_RPs=sd.windowed_RP(68, RP_dir, base_directory + "\\Rossler\\pickle")                                                  
metrics = ["recc_rate", "percent_det", "avg_diag", "max_diag", "percent_lam", "avg_vert", "vert_ent", "diag_ent", "vert_max"]
sd.First_middle_last_sliding_windows_all_vars(Dict_RPs, base_directory + "\\Rossler\\Rossler_data.csv", ["median"], metrics)

And the resulting csv was:

image

I'd like to understand the meaning of each line better. Is it the separated RQA for each one of the groups?

I don't know if I misinterpreted something with the technique. I was expecting only one resulting line as if we applied RQA considering all the signals in the directory as a single input and generated a combined RQA for all those signals.

SwaragThaikkandi commented 3 weeks ago

Yes, ignore the first column, which is label. "window" is basically the estimate you are looking at (median = median of sliding window estimates, mean = mean of sliding window estimates, mode = model of sliding window estimates, whole = from whole RP), "group" is the name of the RP file ( which is a numpy file ) and rest are RQA variables

henriquepaes1 commented 3 weeks ago

So it applies the sliding window individually for each numpy file in the specified directory?

SwaragThaikkandi commented 3 weeks ago

Yes, it applies the sliding window of specified length to each of the RP files saved

SwaragThaikkandi commented 3 weeks ago

Also, currently the step size between sliding windows is 1, will update that to take a custom step size

henriquepaes1 commented 3 weeks ago

I understand. Is there a way to generate a single RP for multiple signals? For example, in this Rossler Attractor example, would it be possible to generate a single RP for all the signals in the directory?

SwaragThaikkandi commented 3 weeks ago

RPs are created for a time series. It shows the recurring instances in the time series by one and the rest by 0. If your signals are time-locked, it is possible to run a multidimensional RQA. Because each time series represents an individual system and its RQA variable, it depends on your time series. But, if you ask about combined RP for multiple time series from different processes, that won't be possible as the RP won't capture the recurring characteristics adequately. And even with multidimensional RQA, computation might be tricky when the number of dimensions is above some values. However, you can still get some trends from the RQA variables estimated. If discussing an interactive task or something similar, consider checking about epistemic network analysis. Still, you will also be looking at a space with some features. Hope this helps

henriquepaes1 commented 3 weeks ago

Fine! Thank you so much for the clarification

SwaragThaikkandi commented 3 weeks ago

I hope you're happy with this clarification. Closing this issue