QuantConnect / Lean

Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
https://lean.io
Apache License 2.0
9.84k stars 3.26k forks source link

Create StatisticsResults Class, Refactor Statistics Class #30

Closed jaredbroad closed 9 years ago

jaredbroad commented 9 years ago

Refactor the Statistics class to be an instance driven, with a constructor which takes the required information and creates an instance of Statistics.

Statistics object should define all the currently generated statistics as public properties and include description parameters for the English text.

Benchmark should be separated out and implement an IBenchmark; be generated & cached in a local file once per day.

It would require two new algorithm helper methods:

  1. void SetBenchmark(IStatisticsBenchmark) - in the initialize set the IBenchmark to use for generating the statistics.
  2. Statistics GetStatistics(TimeSpan period) - In the QCAlgorithm generate an instance of the statistics object for a specified timespan (default to all the algorithm run time).

Replace the statistics generation in Engine with a call to algorithm.GetStatistics, (and update the Result handlers to take a Statistics Object instead of a Dictionary<string,string>.

jaredbroad commented 9 years ago
StefanoRaggi commented 9 years ago

Here are my first thoughts on where to get started:

Step 1

Step 2

Waiting for comments and suggestions...

jaredbroad commented 9 years ago

I like the design using Trade objects. This will simplify creating rolling windows of statistics (e.g. Get all trades between X-Y, generate statistics).

Does this support fractional opens/partial trade closes? E.g. I open at $10 (x10), open more at $12 (x10), close 10 at $15, open 10 more at $14, close all 20 at $13. What is the entry price for first close? second close?

Will you fill up the trade quantities like "stacked cups being filled with water"? (FIFO // First trade, first quantities filled exit price). If so; say Trade-2 is only partially filled with exit price $12, and next exit was at $13, would it be averaged down or would it be separated into new trade objects?

Currently Lean has an average based portfolio system (share price gets averaged on purchase). Gain/loss calculations/dates for US taxes are done by FIFO system I believe, but Im not sure how FIFO accounting is done.

StefanoRaggi commented 9 years ago

For trade counting/generation I would consider one of these two options:

Option 1 (Single Trade) a trade is closed only when the position size crosses above/below zero and open price (avg price) is updated on each fill until the trade is closed. Quantity = 30 Direction = Long EntryPrice = AvgBuyPrice = (1010+1012+1014)/30 = $12 ExitPrice = AvgSellPrice = (1015+20*13)/30 = $13.66

Option 2 (Multiple Trades) a trade is closed on each fill in the opposite direction (using FIFO) and open price (avg price) is updated on each fill in the same direction. Trade 1: Quantity = 10 Direction = Long EntryPrice = AvgBuyPrice = $10 ExitPrice = AvgSellPrice = $15 Trade 2: Quantity = 20 Direction = Long EntryPrice = AvgBuyPrice = (1012+1014)/20 = $13 ExitPrice = AvgSellPrice = $13

I would lean towards Option 2 because it is more detailed and trades generated in this way can always be grouped to obtain the same trades of Option 1 (if a client application needs to): Trade 1+2 (grouped): Quantity = 10+20 = 30 Direction = Long EntryPrice = (1010+2013)/30 = $12 ExitPrice = (1015+2013)/30 = $13.66

I don't know about US taxes, but using FIFO to close trades seems a reasonable choice.

jaredbroad commented 9 years ago

Awesome, I lean towards option 2 as well @StefanoRaggi. I recommend doing more research to find out what accounting systems others have used to ensure our statistics results are directly comparable to other platforms. Maybe there's some reference or expert on the subject we can ask?

adriantorrie commented 9 years ago

Any chance a property for a StopLoss (trailing or hard) could be added to the Trade class?

This would allow calculation of RiskAdjustedEquity. Risk-adjusted equity would allow determination of true remaining cash for establishing positions, and thus affecting position sizing algorithms.

This may not be critical for stocks where cash outlay is upfront, but 'cashless' trading such as futures (when support is added in the not too distant future I hope) would benefit from this.

For example:

I would prefer to be able to pass in RiskAdjustedEquity for summary statistics of any algorithms, as by the nature of some strategies I look to implement, and the trade/money/risk management processes I follow, the RiskAdjustedEquity gives a "truer" view of my performance, resulting in more realistic equity curves. While some my balk at using this because it may cut down their (over-inflated) view of performance at some points in time, it would also smooth out drawdowns and help set expectations clearly around what equity is in the portfolio.

Buy its nature, the stop loss on a futures contract acts like a call option, but with no outlay of the premium up-front, the premium is "paid" on exit by hitting the stop loss, so being able to risk-adjust equity allows for this "premium" to be deducted from portfolio equity.

Whether this suggestion goes against accounting rules (mark-to-market) is not relevant, instead it should be considered an extension to those statistics that are already available (which already do provide a mark-to-market view of equity).

jaredbroad commented 9 years ago

Hey @adriantorrie its out of scope for this change. A StopLoss would be an order type -- these are accounted for here: https://github.com/QuantConnect/Lean/tree/master/Common/Orders

jaredbroad commented 9 years ago

@adriantorrie sorry I just re-read this after a coffee; I totally misunderstood your question on first read! :) Its a very interesting idea, lets come back to it after this initial refactor as a new issue.

StefanoRaggi commented 9 years ago

@jaredbroad I have completed Step 1 (two last commits here for review: https://github.com/StefanoRaggi/Lean/commits/statistics)

I ended up creating a new TradeBuilder class that generates Trade objects in real-time during the execution of the algorithm. This was required to calculate MAE and MFE for each trade (as a byproduct, the ClosedTrades property can also be used by the algorithm itself for trading decisions).

There are three fill grouping methods available (FillToFill, FlatToFlat, FlatToReduced) and two fill matching methods (FIFO and LIFO). At the moment I set FillToFill and FIFO as default values (we can add two new settings in config.json later). Unit tests for all six combinations are in place.

I would like a quick review before proceeding with Step 2.

mchandschuh commented 9 years ago

Code looks great @StefanoRaggi. I noticed we've added a TradeDirection enum, it seems like a duplicate of OrderDirection enum. Also, there's a new TradeExecution class which has similar data as OrderEvent. I think the one thing OrderEvent doesn't have is the Time, which I think it should have. One other thing was placing the TradeBuilder on the IAlgorithm interface. I wonder if there's a way to structure it so we don't need that, if not it's not a huge deal, I just like to keep IAlgorithm as clean as possible and I view this TradeBuilder as algorithm impl specific, but not required by the engine to run.

Other than that code looks great man. Good code style, doc, and even 2k lines of unit tests!!

jaredbroad commented 9 years ago

Incredible PR @StefanoRaggi, there's a lot to digest here

I'm new to a lot of these concepts - can you describe/link them? (Fill To Fill, Fill To Flat, Flat to Reduced). It would be great to understand their intent better so we can review better.

I asked a quant working at a hedge fund and he suggested leaving the default statistics generation source as portfolio average fill prices (like it is at the moment). He said for their internal math its all portfolio average price, but then the accountants use FIFO etc to minimize the tax bill. Does this trade system support a trade technique like how it is now? (perhaps that is that one of the methods above?)

StefanoRaggi commented 9 years ago

@mchandschuh, thanks for the review:

@jaredbroad The definition of the three fill grouping methods are the following:

FillToFill: A Trade is defined by a fill that establishes or increases a position and an offsetting fill that reduces the position size.

Example: Buy 1, Buy 1, Buy 2, Sell 1, Sell 3 would generate three trades

FlatToFlat: A Trade is defined by a sequence of fills, from a flat position to a non-zero position which may increase or decrease in quantity, and back to a flat position.

Example: Buy 1, Buy 1, Buy 2, Sell 1, Sell 3 would generate one trade

FlatToReduced: A Trade is defined by a sequence of fills, from a flat position to a non-zero position and an offsetting fill that reduces the position size.

Example: Buy 1, Buy 1, Buy 2, Sell 1, Sell 3 would generate two trades

Here are a couple of links from where I got some help: https://www.sierrachart.com/index.php?l=doc/doc_TradeActivityLog.php#OrderFillMatchingMethods https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.pdf?root=blotter (pg. 24-25)

At the end of Engine.Run we have a complete list of Trade objects generated according to the chosen fill grouping and matching methods. This list is meant to replace algorithm.Transactions.TransactionRecord and used directly as a Trade List View in any client GUI or ResultHandler (sorted, filtered, etc.).

The required fill price averaging (depending on the grouping method chosen) is already done in the TradeBuilder class. Please have a look at the unit tests, lots of examples there.

Step 2 will take care of obtaining the same statistics that are available now, plus all the new ones (lots of them :smile:)

jaredbroad commented 9 years ago

Thank you for the extra reading. I like what is there so far and it looks like a pretty incredible foundation to build enhanced statistics -- without changing how we do portfolios at the moment!

One thought -- we have to plan for long running live algorithms tight on memory -- please cap the number of trades to 10,000 or 12 months, whichever is first (in live mode) -- then remove the oldest trades first. For backtests there doesn't need to be any limit. For simplicity we could do this step after the statistics (step 2) is done.

With your class we'll be able to have rolling statistics -- this will be a powerful new feature!

Agree with @mchandschuh, probably best to add Time to OrderEvent, "will be required as an input to the new AlgorithmPerformance.GenerateStatistics" -- Good idea, lets run with that and if we see a way later can "tidy" it up.

Awesome work! Keep pushing forward! :)

StefanoRaggi commented 9 years ago

I just committed the first part of Step 2 with some metrics that can be calculated from a list of trades only.

There are many other stats to be added, but a quick review at this point could be useful.

This first release of the AlgorithmPerformance only has a constructor with the list of trades as an argument, but this is going to change quickly.

All calculations are performed using incremental formulas so the metrics can be efficiently updated in real time (every time a trade is closed) without requiring multiple loops over the trade list.

My idea is that this class could have a dual use: (1) Instantiated by the engine at the end of the algorithm run: to perform calculations on a complete list of trades (or filtered by symbol, period, etc.). (2) Instantiated by the algorithm base class at startup: to add trades one at a time (when closed) and make stats available to the algorithm itself during execution, for trading decisions (examples: stop trading if NumberOfLosingTrades today > 2, or TotalProfitLoss today > X). Another use case could be RealTimeStatistics for the GUI.

Thoughts or comments ?

bizcad commented 9 years ago

@StefanoRaggi I do not mean to butt in here, but I have done some work with creating Trades and I hope I can offer some small insight.
I pondered quite a while how to match transactions into trades, particularly when buys and sells do not match quantities, such as with partial trades and adding to a position on nice long run. I decided on a dual stack recursive mechanism. It fulfils both of your requirements above and gives you a ScheduleD as well. Here is how it works.

My trade creator reads from a file of orders or transaction history. Then it attempts to open a position in the OpenPosition method. OpenPositions is a List, one for each symbol.

                Position openPosition = OpenPositions.FirstOrDefault(p => p.Symbol == trans.Symbol);
                if (openPosition == null)
                {
                    openPosition = OpenPosition(trans);
                    OpenPositions.Add(openPosition);
                }
                else
                {
                    ScottradeTransaction scottradeTransaction = ResolvePosition(openPosition, trans);
                    if (openPosition.Buys.Count == 0 && openPosition.Sells.Count == 0)
                    {
                        OpenPositions.Remove(openPosition);
                    }
                }

A Position has a Stack of Buys and a Stack of Sells.

    public class Position
    {
        public int Id { set; get; }
        public string Symbol { get; set; }
        public Stack Buys { get; set; }
        public Stack Sells { get; set; }
    }

OpenPositon pushs a buy transaction onto the Buys Stack and a sell transaction onto the Sell Stack if there is no position..

In ResolvePosition a buy transaction pops a sell transaction off the Sells stack and matches the two transactions up by quantity, and vice versa. If the quantities match, a trade is created. If not the transaction with the greater quantity is split into two transactions: one with a matching quantity and one with the leftovers. A Trade is created with the matching transaction and ResolvePosition is called using the leftover transaction recursively until there are no transactions on the stacks.Any leftovers are then pushed onto the appropriate stack.

This method allows for LIFO matching, which is allowed by the IRS. Replace the Stacks with Queues and you have FIFO matching. I used LIFO to extend positions to longer terms, hopefully to a year or more, for long term capital gain treatment.

My code does not account for wash sales.

My Trade class is slightly different than yours and is matched against IRS 1040 - ScheduleD format. I used the Proceeds and CostOrBasis approach, as does the IRS. Commissions on split transactions are used in the first split, so that the average cost per share in leftovers is the buy price.

public class Trade
    {
        public int Id { get; set; }
        public bool IsOpen { get; set; }  // not used
        public string Symbol { get; set; }
        public int  Quantity { get; set; }
        public string DescriptionOfProperty { get; set; }
        public DateTime DateAcquired { get; set; }
        public DateTime DateSoldOrDisposed { get; set; }
        public decimal Proceeds { get; set; }
        public decimal CostOrBasis { get; set; }
        public string AdjustmentCode { get; set; }
        public decimal AdjustmentAmount { get; set; }

        public decimal GainOrLoss
        {
            get { return Proceeds - CostOrBasis + AdjustmentAmount; } 
        }
        public bool ReportedToIrs { get; set; }  // Reported to IRS on 1099-B
        public bool ReportedToMe { get; set; }  // Reported to me on 1099-B
        public bool LongTermGain { get; set; }
        public int BuyOrderId { get; set; }
        public int SellOrderId { get; set; }
        public string Brokerage { get; set; }
        public decimal CumulativeProfit { get; set; }

    }

I added a couple of fields for reporting convenience such as the BuyOrderId (from the ticket) and CumulativeProfit for summing during the year.

I would be happy to share my code if it would help. I found the stack matching recursive technique to be very quick.

StefanoRaggi commented 9 years ago

@bizcad Currently I am working on the AlgorithmPerformance class, which takes as input a list of Trade objects generated by the TradeBuilder class, which already has an implementation of three types of transaction grouping and two types of transaction matching, for a total of six combinations.

The main goal of this issue is to replace the Statistics class with AlgorithmPerformance, so the Trade class currently has the minimum information required to complete this task. Of course new fields can/will be added to the Trade class, just not in this first iteration.

As for your implementation of grouping and matching, I think we have conceptually done the same thing, only in a different way. Please have a look at the TradeBuilder tests, just to check if the results are as you would expect.

bizcad commented 9 years ago

@StefanoRaggi Thank you for adding the Fees to the OrderEvent. I have been wanting to get that from the order.