Extension of CreateNodes()

earowang / hts

Hierarchical and Grouped Time Series

http://pkg.earo.me/hts

110 stars 36 forks source link

Extension of CreateNodes() #34

Closed panoptikum closed 6 years ago

panoptikum commented 7 years ago

The modification lifts the restriction of the same character length for nodes that are on the same level. The new argument sep retrieves all nodes of one level by splitting the bottom time series names with a separator.

Special cases still need consideration:

e.g. if any time series name does not contain the separator
Open question: can hts and the corresponding forecast function handle a HTS that have branches with different depth?

panoptikum commented 7 years ago

Why do you use rep(1, 4) instead of c(1, 1) for the argument characters? The way I've understood the argument characters is that you specify the number of characters for the nodes on each level. Using the argument characters in your response also outputs more than 3 levels, if I'm not mistaken.

If I follow my logic I get:

> CreateNodes(bnames2, characters = c(1, 1))
$nodes
$nodes[[1]]
[1] 2

$nodes[[2]]
A B 
3 2 

$labels
$labels$`Level 0`
[1] "Total"

$labels$`Level 1`
[1] "A" "B"

$labels$`Level 2`
[1] "AA" "AB" "AC" "BA" "BB"

$index
[1] 1 2 3 4 5

I absolutely agree with you that using base r functions would be much nicer and I will try to adapt my proposal accordingly. It might take some time however, due to time constraints.

panoptikum commented 7 years ago

Further note:

> CreateNodes(bnames1, sep="-")

results in

$nodes[[2]]
A B 
4 3

an incorrect count as you use the same names (A and B) across levels which is not possible with my proposal, I think. It might be possible to allow this, but how likely are same names across levels or even useful?

earowang commented 7 years ago

Sorry there was a typo in the character argument and should be rep(1, 2). Could you please provide a minimal example about how the bottom names should look like in order to use the sep argument`. Thanks.

panoptikum commented 7 years ago

Sure thing:

> bnames <- c("Gas_ABC_OWN", "Gas_ABC_OTHER", "Gas_XYZ_OWN", "Gas_XYZ_OTHER", "Electricity_ABC_OWN", "Electricity_ABC_OTHER", "Electricity_XYZ_OWN", "Electricity_XYZ_OTHER")
> CreateNodes(bnames, sep="_")
$nodes
$nodes[[1]]
[1] 2

$nodes[[2]]
Electricity         Gas 
          2           2 

$nodes[[3]]
ElectricityABC ElectricityXYZ         GasABC         GasXYZ 
             2              2              2              2 

$labels
$labels$`Level 0`
[1] "Total"

$labels$`Level 1`
[1] "Electricity" "Gas"        

$labels$`Level 2`
[1] "ElectricityABC" "ElectricityXYZ" "GasABC"         "GasXYZ"        

$labels$`Level 3`
[1] "ElectricityABCOTHER" "ElectricityABCOWN"   "ElectricityXYZOTHER" "ElectricityXYZOWN"  
[5] "GasABCOTHER"         "GasABCOWN"           "GasXYZOTHER"         "GasXYZOWN"          

$index
[1] 6 5 8 7 2 1 4 3

I'm sure it's not working for all cases yet, but it's a start.

panoptikum commented 6 years ago

I'll work on this as an issue and might recreate the pull request.