AbdullahO / tspdb

tspdb: Time Series Predict DB
Apache License 2.0
185 stars 58 forks source link

Is there a method within the tspDB, to discover the correlation between columns of a matrix? #29

Open MehmetKaplan opened 1 year ago

MehmetKaplan commented 1 year ago

I am trying to understand the atomic behavior of the library by playing with it within PostgreSQL.

I have the following data in my test_data table:

 t | v1 | v2 
---+----+----
 1 |  1 |  0
 2 |  2 |  2
 3 |  3 |  4
 4 |  3 |  6
 5 |  3 |  6
 6 |  4 |  6
 7 |  5 |  8
(7 rows)

Please note the correlation between columns are $v2(t)=2.v1(t-1)$.

For example let's take $t=6$.

$v2(6) = 2 . v1(5) = 2 . 3 = 6$

So I am willing the library to discover (or approximate) this correlation within the matrix.

To experiment I run the following SQL commands:

drop table if exists test_data;
create table test_data
(
    t int,
    v1 numeric,
    v2 numeric
);

insert into test_data (t, v1, v2) values
    (1, 1, 0),
    (2, 2, 2),
    (3, 3, 4),
    (4, 3, 6),
    (5, 3, 6),
    (6, 4, 6),
    (7, 5, 8);

select delete_pindex('test_data_index1');
select delete_pindex('test_data_index2');
select delete_pindex('test_data_index3');
select create_pindex('test_data', 't', '{"v1"}', 'test_data_index1');
select create_pindex('test_data', 't', '{"v2"}', 'test_data_index2');
select create_pindex('test_data', 't', '{"v1", "v2"}', 'test_data_index3');

Unfortunately, predictions seem just the average of the columns. When I change the values of either column, the prediction of the other column is never changed. (Possibly I am not able to supply the correct parameters to the predict function.)

The outputs are:

select 'Predicting:v1, Using:test_data_index1' as Prediction_Type, a.* from predict('test_data', 'v1', 8, 'test_data_index1') as a;

            prediction_type            | prediction | lb | ub 
---------------------------------------+------------+----+----
 Predicting:v1, Using:test_data_index1 |          3 |  3 |  3
(1 row)

select 'Predicting:v1, Using:test_data_index3' as Prediction_Type, a.* from predict('test_data', 'v1', 8, 'test_data_index3') as a;

            prediction_type            | prediction | lb | ub 
---------------------------------------+------------+----+----
 Predicting:v1, Using:test_data_index3 |          3 |  3 |  3
(1 row)

select 'Predicting:v2, Using:test_data_index2' as Prediction_Type, a.* from predict('test_data', 'v2', 8, 'test_data_index2') as a;

            prediction_type            |          prediction           |              lb               |              ub               
---------------------------------------+-------------------------------+-------------------------------+-------------------------------
 Predicting:v2, Using:test_data_index2 | 4.571428571428571428571428571 | 4.571428571428571428571428571 | 4.571428571428571428571428571
(1 row)

select 'Predicting:v2, Using:test_data_index3' as Prediction_Type, a.* from predict('test_data', 'v2', 8, 'test_data_index3') as a;

            prediction_type            |          prediction           |              lb               |              ub               
---------------------------------------+-------------------------------+-------------------------------+-------------------------------
 Predicting:v2, Using:test_data_index3 | 4.571428571428571428571428571 | 4.571428571428571428571428571 | 4.571428571428571428571428571
(1 row)

Hence, long question in short, is there a method within the tspDB, to discover the correlation between columns of a matrix?