ZhengyaoJiang / PGPortfolio

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).
GNU General Public License v3.0
1.74k stars 750 forks source link

Selecting prices for online operations #53

Open ziofil opened 6 years ago

ziofil commented 6 years ago

What is your experience with actual online trades? If I use the closing price, most of the orders end up not being executed because by the time I issue the buy/sell order, the price has already changed.

I tried mitigating this problem by training the agent with higher fee rates and offsetting that extra fee to a worse price for me (so that the order will most certainly be executed), but the agent can't get a return > 1 even with 0.30% of fees instead of 0.25%.

WojciechMigda commented 6 years ago

This implementation and the associated paper are a nice theoretical work. Trade execution is another subject. I was recently exploring exactly the same issues as you did. Just by examining order books of even most liquid currencies on Bittrex one can see that often even spreads are greater than trading fees. If you add slippage to that 0.25% fee encoded in the default configuration is a very optimistic assumption. I did some experiments with fees as high as 1% (to account for the aforementioned issues) and with wider ticks (2h and 4h) and the framework was barely making it. In my own research I was focusing with prior to finding this paper/repository I was doing paper trading assuming that for buys I would pessimisticaly take the high price from the future tick that follows, and the low price for sells, respectively. I am not sure it could be implemented in this framework in a straighforward manner as well (if I am not mistaken pricing is currenly evaluated by passing relevant tensors to tensorflow), but at least it could bring the simulation a little bit closer to the actual trading conditions.

dexhunter commented 6 years ago

I would pessimisticaly take the high price from the future tick that follows, and the low price for sells,

You mean using Highest Bid for selling and Lowest Ask for buying? In that case transaction cost will increase about 1-2%. I wonder with such prices in the paper trading can still you profit and how much?

WojciechMigda commented 6 years ago

In that case transaction cost will increase about 1-2%. I wonder with such prices in the paper trading can still you profit and how much?

With a setup like this the network should be able to learn to buy while still on a downtrend and sell while still on an uptrend, when chances are that low/high price in the future tick are more likely to be close or better to the closing price in the current tick. Only then the transaction cost could be minimized towards mere market taker trading fee imposed by the exchange.

dexhunter commented 6 years ago

With a setup like this the network should be able to learn to buy while still on a downtrend and sell while still on an uptrend, when chances are that low/high price in the future tick are more likely to be close or better to the closing price in the current tick.

But the priority of the agent is not to minimize the transaction cost. Besides, there is no order book data fed under current framework. I think integrating TA-lib to the framework could be helpful to identify the buying/selling signal although our framework is financial-model-free which means it doesn't classify the market trend.

edit: Another issue I think is the output action is portfolio vector, thus the buying/selling is not considered by the agent.

ziofil commented 6 years ago

I think there are a few factors that are affecting this issue:

1) Poloniex takes up to minutes to compile and return the chart data after a period has just ended. 2) Lots of bots send their aggressive buy/sell orders just after a period ends. 3) The closing price could be either a buy or a sell.

A solution to 1) is to compile our own HLC data: Poloniex is much quicker at returning all the orders that have been executed, and so we can compute high = max(buys), low = min(sells) and close=last(buys, sells). This takes about 2-3 seconds for 10-15 currency pairs.

A solution to 2) could be to anticipate the bots. I still have to experiment, but this could mean for example to compile the HLC data and issue our orders a few seconds before a period ends, to avoid all the traffic/speculation that occurs at the beginning of a new period.

I have no solution to 3), other than compiling last(sells) and last(buys) values separately and use those as input to the agent.

I did try to understand how the density of orders behaves, and I have discovered something interesting: These are histograms of the number of orders in 30 seconds bins in every hour (the data is from the whole last month, 720 hours). What is the spike around 23min in the BTC_ETH chart?? It's also present in the BTC_XMR chart, although not as pronounced. Anyway, there are clearly more orders issued at the beginning of the 30 min periods, and not so many just before, which corroborates my proposed solution to 2)

UPDATE I've discovered another thing: this is the relative price variation of BTC_ETH around the end of a period (normalized to the last price of the period, and again the data is the average from the whole last month, 720 hours). It seems that the best moment to buy is, on average, about 20-25 secods after the beginning of a period, while the best moment to sell is right away. This seems to be a general rule (BTC_BCH, BTC_XMR, BTC_XRP)

dlacombejr commented 6 years ago

@ziofil those are some very interesting exploratory analyses on the market behavior.

I think you're solution to 1) is good -- Poloniex also has a socket API, but I would prefer your method of sticking with the REST API and just calculating HLC manually from trade history.

In my own online trading application, I place limit orders using the ask or bid price, depending on which side the order is on. However, even if we sell at these prices, many orders don't get completely filled during the interval, and the outstanding trade has to be cancelled before the end of the interval, just prior to obtaining omega prime. Using this trade execution strategy, backtesting should really be done using those prices, not close, and probably should be incorporated into the input.

As we know, the limitation of the model from the paper is that it can only account for transaction costs. It cannot model slippage or impact. For this to be truly RL, the model should be able to take order book data as input and/or incorporate slippage into the loss via the transaction remainder factor. Modeling market impact would be much harder.

As suggested, increasing the buy/sell fees might just be a simple solution that should make the agent move around less of the total volume of assets on each interval, thus limiting the impact of slippage and market impact. Then you can just time the placement of buys and sells based on your exploratory analysis. Good news is that we want to perform all sells first to free up cash for buys anyway.

lytkarinskiy commented 6 years ago

@dlacombejr there is another opportunity - you can have enough balance - 2.5x - to do parallel execution of buy and sell, but here you need to calculate execution volumes explicitly.

akaniklaus commented 6 years ago

@WojciechMigda You wouldn't like to buy / sell while it is still on a trend. You would rather buy / sell on a trend reversal when the trend (e.g. ADX) is weak. Otherwise, the price would continue decreasing and you would lose money (or it would continue increasing, which would mean that you exited too early) @ziofil I have calculated them from trades for another reason, never thought that would reduce the latency. Good idea if it really takes that much of a delay for Poloniex to compile and serve the candles.