Open MislavSag opened 2 years ago
Dear Mislav,
actually, the implementation and the use of TdistributionH
and HTS
is ongoing, in the sense that the constructors are very basic. I suppose you need distribution for each day for building the HTS
(that is a series of histograms). Anyway, for building a TdistributionH
the code is
# construct TdistributionH-class object from DT ??
# CODE HERE
My_new_Tdistr <- new("TdistributionH",
period=list(start=min(DT$DT),end=max(DT$DT)), #here you fix the starting and ending time point
x=x@x,p=x@p,m=x@m,s=x@s)
Now, I will show you how to construct an HTS
for each day
library(highfrequency)
library(data.table)
library(tidyverse)
# data
DT <- highfrequency::sampleOneMinuteData
DT[, ret := STOCK / shift(STOCK) - 1]
DT <- DT[, .(DT, ret)]
DT<-DT %>% na.omit() %>% mutate(day=format(DT, format = "%Y-%m-%d"))
tmp<-DT %>% group_by(day) %>% group_rows()
# CREATE AN EMPTY list
list_of_t=list()
for (i in 1:length(tmp)){
#create a TdistributionH
tmpx <- data2hist(DT$ret[tmp[[i]]] %>% na.omit())
mint=min(DT$DT[tmp[[i]]])
maxt=max(DT$DT[tmp[[i]]])
My_new_Tdistr <- new("TdistributionH",
tstamp=i, #take care because here only numeric values are admitted
period=list(start=mint,
end=maxt),
x=tmpx@x,p=tmpx@p,m=tmpx@m,s=tmpx@s)
list_of_t[[i]]<-My_new_Tdistr
}
new_HTS<-new("HTS", epocs=length(tmp),
ListOfTimedElements=list_of_t)
plot(new_HTS) #see it
Anyway, cluster methods work only with MatH
instances. It means that, if you need to cluster them via k-means (for example), you have to construct the following code:
# CREATE AN EMPTY HTS
list_of_t=list()
for (i in 1:length(tmp)){
#create a TdistributionH
tmpx <- data2hist(DT$ret[tmp[[i]]] %>% na.omit())
list_of_t[[i]]<-tmpx
}
new_mat<-MatH(x=list_of_t, nrows=length(tmp),
ncols = 1,
rownames = unique(DT$day),
varnames = "returns")
plot(new_mat, type="DENS") # to see the data
res<-WH_kmeans(new_mat,k=3) #to perform k-means
@Airpino ,
Thanks a lot for sample codes.
You are right, my plan was to upsample intraday data to daily data by constructing histograms.
Second plan is to use daily or hourly data for multiple stocks (say Sp500 stocks) and make histogram as a cross section of returns.
Actually, my first motivation to inspect your pacakge was this new paper: https://arxiv.org/pdf/2110.11848.pdf I was trying to find the package in R/pyhon tht implements somethind similar.
I want to play around with time serie clustering method to see if is it possible to predict market regimes.
I understand how to construct objects now.
What is really the differences between HTS
object and MatH
? As I understand, the only defference is timestamp in HTS
.
I will use MatH
in the end, since most functions requre this object.
do you have eny recommendation in applying the models from the package on predicting market regimes. Is it in your opinion the reasonable approach?
I will open new issue if I will have additional questions. Thanks.
Dear Mislav, as I told you, HTS is just a very basic prototype for which few analysis methods are implemented. This is because I have not yet worked on the analysis of HTS. The main difference between MatH and HTS is that MatH can contain several columns (it is a generalization of a classical data table where each cell has a 1d histogram). At the same time, HTS is a list of a single time series of histograms equipped with time stamps. I am not an expert in financial data analysis, but your approach seems reasonable. There are very few methods implemented in the package for HTS, but there is room for extending classical forecasting techniques to histogram time series. If you have any proposal (Autoregressive techniques, for example, can be implemented using the two.component.regression model for histogram data, moving averages can be implemented too,...) you can write me and I try to give you some hints for that.
@Airpino ,
I am playing around with the package. I have tried 3 different aproaches for now:
In prediction, we are mostly interesed in out of sample predicitons. Is it possible to predict clusters (where k can be 1 and 2) for n period in your packge?
I am mostly interesed in rolling predicions because this is how it is mostly done in real investing.
Should I use rolling window or expanding window of histograms (distributions)?
@Airpino , would like to have a following question of this insightful discussion. Also working on the same paper @MislavSag mentioned above, the in-sample result via HistDAWass package looks very convincing. would like to dive deeper to predict out of sample data, wondering if there is a prediction function for WH_kmeans? many thanks,
I can't figure out from the the CRAN package docs hoe to prepare data for the (cluster) analysis.
I have time series data with intraday frequency. I would like to to identify 3 clusters.
If I understand it right, I need to construct TdistributionH objects from my time series vector. But I am not sure how to transform my
POSIXct
object to time stamp and how to add timestamp todistributionH
objet.Here is my sample data: