braverock / quantstrat

289 stars 114 forks source link

walk.forward uses only the [testing.timespan] subset from: [R-Forge #6258] bug in demo(luxor.8.walk.forward) in quantstrat 0.9.1687 #34

Closed joshuaulrich closed 7 years ago

joshuaulrich commented 8 years ago

Submitted by: Alex B Assigned to: Nobody R-Forge link

demo(luxor.8.walk.forward) from quantstrat 0.9.1687 produces the following bug: 'Error in if (!all(i <= 0)) stop('only zeros may be mixed with negative subscripts') : missing value where TRUE/FALSE needed '

Below is the code reproducing the bug (or simply run the demos in quantstrat up to the demo named luxor.8.walk.forward

#######################################################################
#!/usr/bin/Rscript --vanilla
#
# Jan Humme (@opentrades) - April 2013
#
# Tested and found to work correctly using blotter r1457
#
# After Jaekle & Tamasini: A new approach to system development and portfolio optimisation (ISBN 978-1-905641-79-6)
#
# Paragraph 3.7 walk forward analysis

require(quantstrat)

source(paste0(path.package('quantstrat'),'/demo/luxor.include.R'))
source(paste0(path.package('quantstrat'),'/demo/luxor.getSymbols.R'))

### foreach and doMC

require(foreach)
require(doMC)
registerDoMC(cores=8)

### blotter

initPortf(portfolio.st, symbols='GBPUSD', initDate=initDate, currency='USD')
initAcct(account.st, portfolios=portfolio.st, initDate=initDate, currency='USD', initEq=100000)

### quantstrat

initOrders(portfolio.st, initDate=initDate)

load.strategy(strategy.st)

enable.rule(strategy.st, 'chain', 'StopLoss')
#enable.rule(strategy.st, 'chain', 'StopTrailing')
enable.rule(strategy.st, 'chain', 'TakeProfit')

addPosLimit(
  portfolio=portfolio.st,
  symbol='GBPUSD',
  timestamp=initDate,
  maxpos=.orderqty)

### objective function

ess <- function(account.st, portfolio.st)
{
  require(robustbase, quietly=TRUE)
  require(PerformanceAnalytics, quietly=TRUE)

  portfolios.st <- ls(pos=.blotter, pattern=paste('portfolio', portfolio.st, '[0-9]*',sep='.'))
  pr <- PortfReturns(Account = account.st, Portfolios=portfolios.st)

  my.es <- ES(R=pr, clean='boudt')

  return(my.es)
}

my.obj.func <- function(x)
{
  # pick one of the following objective functions (uncomment)

  #return(max(x$tradeStats$Max.Drawdown) == x$tradeStats$Max.Drawdown)

  #return(max(x$tradeStats$Net.Trading.PL) == x$tradeStats$Net.Trading.PL)

  return(max(x$user.func$GBPUSD.DailyEndEq) == x$user.func$GBPUSD.DailyEndEq)
}

### walk.forward

r <- walk.forward(strategy.st,
                  paramset.label='WFA',
                  portfolio.st=portfolio.st,
                  account.st=account.st,
                  period='months',
                  k.training=3,
                  k.testing=1,
                  obj.func=my.obj.func,
                  obj.args=list(x=quote(result$apply.paramset)),
                  user.func=ess,
                  user.args=list('account.st'=account.st,
                                 'portfolio.st'=portfolio.st),
                  audit.prefix='wfa',
                  anchored=FALSE,
                  verbose=TRUE)

###############################################################################
# ERROR: Error in if (!all(i <= 0)) stop('only zeros may be mixed with negative subscripts') :
#        missing value where TRUE/FALSE needed

### analyse

pdf(paste('GBPUSD', .from, .to, 'pdf', sep='.'))
chart.Posn(portfolio.st)
dev.off()

ts <- tradeStats(portfolio.st)
save(ts, file=paste('GBPUSD', .from, .to, 'RData', sep='.'))

Followups:

Date: 2016-03-08 14:26 Sender: Kenny Liao Also, there seems to be a similar error in luxor.4. I posted this question in full in stackoverflow: http://stackoverflow.com/questions/35843473/quantstrat-demo-luxor-4-error-attempt-to-select-less-than-one-elementCould you have a look at this as well? Thanks!

Date: 2016-03-08 12:11 Sender: Kenny Liao Thanks Alex, your reply has been very helpful. I did what you suggested to overwrite and installed from updated source and it almost works perfectly. I didn't download the data you suggested in previous posts, could it be the reason(the data link you provided I cannot access from China)? Could you have a look? Thank youI tried twice: first, I tried to run luxor.8 directly; second time, I ran luxor.1, luxor.2, luxor.5, and finally luxor.8. I received similar error as following:

error calling combine function:
<simpleError in fun(result.1, result.2, result.3, result.4, result.5, result.6,
     result.7, result.8, result.9, result.10, result.11, result.12,
     result.13, result.14, result.15): attempt to select less than one element>
Error in walk.forward(strategy.st, paramset.label = 'WFA', portfolio.st = portfolio.st,  :   obj.func() returned empty result
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In (function (x)  ... : discarding extra objective function result(s)
2: In ruleOrderProc(portfolio = portfolio, symbol = symbol,  ... :  ignoring order with quantity of zero

my sessionInfo() returns the following:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] robustbase_0.92-5             doMC_1.3.4                    iterators_1.0.8               quantstrat_0.9.1709           foreach_1.4.3                 blotter_0.9.1695              PerformanceAnalytics_1.4.3541 [8] FinancialInstrument_1.2.0     quantmod_0.4-5                TTR_0.23-0                    xts_0.9-7                     zoo_1.7-12                   

loaded via a namespace (and not attached):
[1] lattice_0.20-33  codetools_0.2-14 grid_3.2.3       tools_3.2.3      DEoptimR_1.0-4   compiler_3.2.3 

Date: 2016-03-07 18:22 Sender: Alex B Hello Kenny, the easiest solution at this point would be to download this SVN repository and simply overwrite the existing files with these: https://usercontent.irccloud-cdn.com/file/g1B2sUBZ/walk.forward.R and https://usercontent.irccloud-cdn.com/file/1R6dv311/luxor.8.walk.forward.R. Then install the quantstrat package from your updated sources. My patches have not been confirmed/approved yet (i.e. all I know is that they work on my computers). Also, I think you do not have to download all the www folder for this demo to work, if I remember correctly. (If you need more help, feel free to join the IRC channel #R-Finance at freenode network, but you might have to wait for your question to be answered depending on what time zone you live in.)

Date: 2016-03-07 09:45 Sender: Kenny Liao Hi Alex B, thanks a lot for this fix and patch. I am new to R and quantstrat and encountered the same bug [#6258] you reported. I want to make use of your fix/patch to make luxor.8 work, but I don't know how exactly to put your patch into practice. could you kindly provide a step by step guide on how to make luxor.8 work using your patch? also, do I still need to download all the data in the 'www' folder in the svn repository? Thank you very much!Kenny

Date: 2016-02-18 00:05 Sender: Alex B All the explicit errors should be fixed now. Please apply patches for ES() (in the attachment section to this 'thread') and patch [#6291] submitted in the 'Patches' section of the project.

Date: 2016-02-17 10:08 Sender: Alex B I made the previous comment before I knew there was no bug tracker for the PerformanceAnalytics package. Please treat the ES()-related issue as a separate bug. To keep the proper record of history, I attached the patch for the ES() function to this thread (file 'ES_zero_return_column_input_bugfix-20160217-0326.diff'). PS: I will try to submit the rest of the fixes for walk.forward() later today or tomorrow.

Date: 2016-02-17 00:45 Sender: Alex B I made several fixes in my copy of the code and it seems that I found the root of the bug: a failing user.func within apply.paramset() within walk.forward(). Some parameter settings do not produce any trades (can happen no matter how much data one uses). In such cases function ES (in this example) gets 'portfolio returns' argument with all values equal to zero. This zero-valued argument causes ES() to crash. Demonstration of the bug: get this data ES_test.RData (see the attachment, which contains object 'pr' - portfolio returns ) and run the following line 'ES(R=pr, clean='boudt')' (taken from the demo luxor.8....)I am submitting a patch for this particular bug now.At the same time, I still have to go back and reverse all the other changes in my local copy of the code and see whether those prior changes are still required after this bug is fixed. So let's not close this bug report for now ( [#6258] ).

Date: 2015-11-16 01:09 Sender: Alex B I may have fixed the glitch. Seems like there must have been more data for the demo to work. Here's the patch: https://usercontent.irccloud-cdn.com/file/sEQaupmr/patch_luxor_demo8.diff For this fix to work, one must also create a folder /extdata_full/ and copy GBPUSD data from the repo folder 'www' up until & including the file dated '2003-02-27' (that's about 500 kilobytes in 82 files). Perhaps the number of files can still be reduced. (My hypothesis is that the glitch may have been caused by training period end dates which had no data, but that's just a guess.)

Date: 2015-11-15 01:20 Sender: Alex B Brian, I downloaded all the data you uploaded to the 'www' folder in the svn repository, changed luxor.include.R to http://pastebin.com/8f3u3HQX (to include all the data). And here's what I get when running demo luxor.5.strategy.ordersets.R -- http://pastebin.com/62qH1Z2V The data skips a lot of dates and some files seem unusable. Am I doing anything wrong here or the data is indeed corrupted? PS if anyone needs to get this data quickly here's the code http://pastebin.com/NvY18tkR (feel free to include it in quantstrat)

Date: 2015-11-12 10:41 Sender: Alex B Perhaps, as a temporary fix, the demo might include the code for downloading it from r-forge and run using that downloaded data (or part of it sufficient enough not to cause errors). Later, the data might simply be copied to Quandle (or somewhere where it could sit with a permanent web address) and the demo would download it from there.

Date: 2015-11-11 20:40 Sender: Brian Peterson I can provide a little more context. Yes, the data originally used was many years longer, mirroring the data set from Tomasini. I recently put the entire data set back into the 'www' tree on R-Forge, so it would be possible for people to replicate exactly what we did at the time the scripts were written.

Date: 2015-11-11 19:28 Sender: Joshua Ulrich I can replicate the 'missing value where TRUE/FALSE needed' error (after running luxor.5.strategy.ordersets.R). It seems to be because there isn't enough data for period='months' to work. After changing period='days', I get the 'obj.func() returned empty result' error. That was caused by robustbase not being installed on my system. After installing robustbase, I get an error, 'Error in runSum(x, n) : Invalid 'n'' because the code is trying to take a 40+ period SMA on only ~25 observations.I wondered if the data used in the demos was longer at some point, so I checked the commit history. The luxor.8 demo was added in r1466, and the GBPUSD data had not changed after that, so it could not have changed and caused a regression bug. So maybe the walk forward parameters were changed after r1466? No, they were set in r1464.Finally, I checked out r1467 and tried to run the luxor.8 demo (after running the luxor.5 demo first, and setting period='days'. I still got the 'invalid 'n'' error. Therefore, my current hypothesis is that this demo never worked as committed.

Date: 2015-11-10 17:01 Sender: Alex B Here's what Guy Yollin suggested in this post: https://stat.ethz.ch/pipermail/r-sig-finance/2014q3/012721.htmlI quote: -begin quote-I'm pretty sure that you can get the luxor.8.walk.forward.R script to run successfully as follows:1. execute luxor.5.strategy.ordersets.R which saves the strategy2. modify the the period argument in the call to walk.forward() to be period='days'-end quote-In my case, I no longer see the error I reported in my bug report (above) ; however another error appears'Error in walk.forward(strategy.st, paramset.label = 'WFA', portfolio.st = portfolio.st, : obj.func() returned empty resultIn addition: There were 31 warnings (use warnings() to see them)'HTH

evelynmitchell commented 8 years ago

This appears to be related to the SMA calculation running out of data on the walk forward.

source('~/1CurrentProjects/quantstrat/demo/luxor.8.walk.forward.R')
Loading required package: quantstrat
Loading required package: quantmod
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
Loading required package: blotter
Loading required package: FinancialInstrument
Loading required package: PerformanceAnalytics

Package PerformanceAnalytics (1.5.0) loaded.
Copyright (c) 2004-2015 Peter Carl and Brian G. Peterson, GPL-2 | GPL-3
http://r-forge.r-project.org/projects/returnanalytics/

Attaching package: ‘PerformanceAnalytics’

The following object is masked from ‘package:graphics’:

    legend

Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
loading  GBPUSD .....
Reading  2002.10.21.GBPUSD.rda ... done.
Reading  2002.10.22.GBPUSD.rda ... done.
Reading  2002.10.23.GBPUSD.rda ... done.
Reading  2002.10.24.GBPUSD.rda ... done.
Reading  2002.10.25.GBPUSD.rda ... done.
Reading  2002.10.28.GBPUSD.rda ... done.
Reading  2002.10.29.GBPUSD.rda ... done.
Reading  2002.10.30.GBPUSD.rda ... done.
Reading  2002.10.31.GBPUSD.rda ... done.
rbinding data ... done.
Loading required package: doMC
Loading required package: iterators
Loading required package: parallel
[1] "=== training WFA on 2002-10-21 00:30:00/2002-10-23 23:30:00"
numValues: 15, numResults: 0, stopped: TRUE
got results for task 1
numValues: 15, numResults: 1, stopped: TRUE
returning status FALSE
got results for task 2
numValues: 15, numResults: 2, stopped: TRUE
returning status FALSE
got results for task 3
numValues: 15, numResults: 3, stopped: TRUE
returning status FALSE
got results for task 4
numValues: 15, numResults: 4, stopped: TRUE
returning status FALSE
got results for task 5
numValues: 15, numResults: 5, stopped: TRUE
returning status FALSE
got results for task 6
numValues: 15, numResults: 6, stopped: TRUE
returning status FALSE
got results for task 7
numValues: 15, numResults: 7, stopped: TRUE
returning status FALSE
got results for task 8
numValues: 15, numResults: 8, stopped: TRUE
returning status FALSE
got results for task 9
numValues: 15, numResults: 9, stopped: TRUE
returning status FALSE
got results for task 10
numValues: 15, numResults: 10, stopped: TRUE
returning status FALSE
got results for task 11
numValues: 15, numResults: 11, stopped: TRUE
returning status FALSE
got results for task 12
numValues: 15, numResults: 12, stopped: TRUE
returning status FALSE
got results for task 13
numValues: 15, numResults: 13, stopped: TRUE
returning status FALSE
got results for task 14
numValues: 15, numResults: 14, stopped: TRUE
returning status FALSE
got results for task 15
numValues: 15, numResults: 15, stopped: TRUE
first call to combine function
evaluating call object to combine results:
  fun(result.1, result.2, result.3, result.4, result.5, result.6, 
    result.7, result.8, result.9, result.10, result.11, result.12, 
    result.13, result.14, result.15)
returning status TRUE
[1] "=== testing param.combo 5 on 2002-10-24/2002-10-24 23:30:00"  "=== testing param.combo 10 on 2002-10-24/2002-10-24 23:30:00"
   nFAST nSLOW
5      9    42
10     9    44
[1] "=== training WFA on 2002-10-22/2002-10-24 23:30:00"
numValues: 15, numResults: 0, stopped: TRUE
got results for task 1
numValues: 15, numResults: 1, stopped: TRUE
returning status FALSE
got results for task 2
numValues: 15, numResults: 2, stopped: TRUE
returning status FALSE
got results for task 3
numValues: 15, numResults: 3, stopped: TRUE
returning status FALSE
got results for task 4
numValues: 15, numResults: 4, stopped: TRUE
returning status FALSE
got results for task 5
numValues: 15, numResults: 5, stopped: TRUE
returning status FALSE
got results for task 6
numValues: 15, numResults: 6, stopped: TRUE
returning status FALSE
got results for task 7
numValues: 15, numResults: 7, stopped: TRUE
returning status FALSE
got results for task 8
numValues: 15, numResults: 8, stopped: TRUE
returning status FALSE
got results for task 9
numValues: 15, numResults: 9, stopped: TRUE
returning status FALSE
got results for task 10
numValues: 15, numResults: 10, stopped: TRUE
returning status FALSE
got results for task 11
numValues: 15, numResults: 11, stopped: TRUE
returning status FALSE
got results for task 12
numValues: 15, numResults: 12, stopped: TRUE
returning status FALSE
got results for task 13
numValues: 15, numResults: 13, stopped: TRUE
returning status FALSE
got results for task 14
numValues: 15, numResults: 14, stopped: TRUE
returning status FALSE
got results for task 15
numValues: 15, numResults: 15, stopped: TRUE
first call to combine function
evaluating call object to combine results:
  fun(result.1, result.2, result.3, result.4, result.5, result.6, 
    result.7, result.8, result.9, result.10, result.11, result.12, 
    result.13, result.14, result.15)
returning status TRUE
[1] "=== testing param.combo 5 on 2002-10-25/2002-10-25 13:30:00"  "=== testing param.combo 10 on 2002-10-25/2002-10-25 13:30:00" "=== testing param.combo 15 on 2002-10-25/2002-10-25 13:30:00"
   nFAST nSLOW
5      9    42
10     9    44
15     9    46
Error in runSum(x, n) : Invalid 'n'
In addition: Warning messages:
1: In beg:(n + beg - 1) :
  numerical expression has 2 elements: only the first used
2: In 1:(n - 1 + NAs) :
  numerical expression has 2 elements: only the first used
3: In beg:(n + beg - 1) :
  numerical expression has 2 elements: only the first used
4: In 1:(n - 1 + NAs) :
  numerical expression has 2 elements: only the first used
5: In beg:(n + beg - 1) :
  numerical expression has 3 elements: only the first used
6: In 1:(n - 1 + NAs) :
  numerical expression has 3 elements: only the first used
7: In `/.default`(runSum(x, n), n) :
  longer object length is not a multiple of shorter object length
joshuaulrich commented 7 years ago

The essence of this issue is that the walk-forward analysis function will only use the testing timespan for all necessary calculations for the strategy. That means all indicators, signals, etc will only see the testing data for their calculations. This is a problem for long-term indicators (e.g. 200-day SMA) and indicators that require ~3x their window length to stabilize (e.g. EMA).

It makes sense to restrict the testing window to only the out-of-sample data in order to only run the rules on the test sample. But the strategy should use all available historical data to calculate indicators, at minimum. It may also make sense to run signals on all available data, but I have not thought about that much.

braverock commented 7 years ago

I think that the 'easiest' solution to this will be to break up the calls inside walk.forward.

Each paramset should have the entire series used for indicators and signals, and then each subset should be evaluated separately for rules.

  1. It seems like we could use applyIndicators and applySignals accross the entire period, and then only apply the rules to each subset.

  2. Alternately, we could run full-period backtests for every paramset, and then use performance in the testing subset to select which one will be used 'out of sample'. One challenge of this second method is that the rules would not be fully path dependent, since you could have a residual position from prior periods under different parameter sets.

Assuming that the first method is the best, then applyIndicators and applySignals would first be called for every paramset, and stored. Then applyRules would be called using only the mktdata subset from the period being examined.

edward-wilson commented 7 years ago

Just chiming in to support braverock's approach above. Re-generating signals for each testing subset has been a roadblock for my strategies and WFA in quantstrat as outlined (admittedly quite poorly) in a comment on #44.

braverock commented 7 years ago

Just commenting on the ultimate solution here (more detail in the commits):

The way things work in the current code are that indicators and signals will cover the entire period, and applyRules will be called with the new optional argument rule.subset defining the training(testing) subset to use.

This means that entry/exit/etc signals for rule.subset will be checked.

This would still be in-exact for the training period for strategies with long holding periods that would tend to cross train/test boundaries. such strategies should probably call walk.forward with anchored=TRUE