Open dlacombejr opened 6 years ago
In #29 I was trying to make cash a variable in computational graph instead of a constant. But it doesn't seem to help much. There are several possible ways I can think of to address the issue,
@dlacombejr Hi, My interpretation of this is reserving BTC is not a wise decision when the agent aims to maximize expectations. Since the btc_bias is fixed to 0, the agent has to improve all other votings to squash the cash bias. When using cash as a variable, you might observe different outcomes. (I guess the gradient of cash bais may be negative.)
Your methodology (observing gradients) is great, I'm looking forward for your PR and further experiment about this topic. I think derivate the gradients mathematically will also help. For example, the gradient of a specific portfolio weight is:
And you can further figure out the gradients of voting value and BTC_bias, the gradient of BTC bias will be negative if wy is greater than 1, while if wy is around 1 the gradient of BTC_bias is just 0.
Regards Zhengyao
Thanks for your responses @DexHunter @ZhengyaoJiang , but I'm still concerned there is a bigger issue here.
I trained a network using the default configuration. The loss goes down as expected (although it does begin to rise later, not sure how much of a problem that is):
When we look at the omega vector distribution throughout training, we can see that it starts off with all votes roughly equal (i.e., ~0.083) and quickly begins to diverge into a state where most of the portfolio is in a subset of the assets, and the others are low -- a very sparse distribution across the asset classes.
That's fine, but what happens when we take a look at the voting layer just before the softmax activation function is applied? Here are the votes for all of the non-cash assets, followed by all the votes including the cash (BTC) vote:
We can see that the votes for non-cash assets continue to rise to very large values throughout the training period, whereas the cash bias stays at approximately zero. Even though zero values before a softmax layer can be non-zero afterwards (as demonstrated in #17), the approximately zero values of the cash bias will almost certainly be zero after applying softmax with such large values for the other assets.
So I tried to take a look at the weights gradients to make sense of this. Here are the weights and gradients for the cash bias (gradients are at the tile tensor in the graph):
We can see that the gradients for the cash bias are initially positive, pushing the cash bias into negative values. Very quickly, the gradient vanishes, and the cash bias stops updating and remains at a constant negative value. Taking a look at the weights of the last convolutional layer that produces the votes for the non-cash assets, we can see that the filter's weights continue to grow over time (reflecting the growing voting values) and the bias for that filter shows an opposite trend to the cash bias -- it jumps up abruptly:
So it seems that the combination of an abruptly dropping cash bias and abruptly rising non-cash asset voting bias (and corresponding weights) results in vanishing gradients for the cash bias that can never recover. Even if the loss indicated that the portfolio would perform bad with the entire portfolio in non-cash assets, the cash bias could not be updated. Using the gradient update equation you posted, the numerator would be ~0 for cash bias and the resulting quotient would be ~0. Any more insight connecting the derivative equations to the graphs above would be great. Perhaps this is even related to the ever-increasing weight values of Conv2D_2.
I'll make the PR shortly so others can see what I see without having to do screenshots :).
@dlacombejr
Hi,
Thanks for your contribution!
One thing I want to mention is, in order to see loss value or gradients on training set, we should turn the "fast_train" in net_config to false
.
Then you will see a new set of summaries called "train" in tensorboard.
I have the impression that using batch normalization on the output of all the Conv layers seems to help here . I see non zero BTC values in my output when using batch normalization during training, and they are all 0 when not using it
@joostbr I can confirm, that batch normalization works!
@ZhengyaoJiang thanks for pointing that out, I'll turn that on!
@joostbr @lytkarinskiy thanks for bringing up normalization layers, should have thought of that. To be sure, you mean Batch Normalization (BN) and not Local Response Normalization, right? Assuming you meant BN, @joostbr and @lytkarinskiy where are you applying it? I'm guessing that you're applying it to the first two default convolution layers (i.e., ConvLayer + EIIE_Dense, with linear activation and ReLU applied after BN) and not on the last voting layer (i.e., EIIE_Output_WithW, which is also convolutional) before the softmax activation. Is this correct?
Also, I'm adding functionality to support more layers from TFLearn (including stand-alone ReLU and Batch Normalization) as well as multiple networks in the network configuration file. This way, networks can be configured, tagged (with description if needed), and the number of instances of each network defined in advance instead of after the train package is generated. I think this is a nice way to run experiments on many different hyperparameters and is less error-prone than changing configuration file after package has been developed. I'll make a PR on this as well if there is interest.
@dlacombejr I used this one - tf.layers.batch_normalization for all 3 layers (both default and voting one). But for sure we have to check were to place norm layer and where not to place.
Thanks!
@dlacombejr @lytkarinskiy
I kind of did the same on all 3 layers, I did rewrite the 'core-network' code in pure tensorflow though, I used a slightly more complex batch normalization from the book 'Fundamentals of Deep Learning', from Nikhil Buduma. Important is to pass the 'training' state to the evaluation, set it to False when doing testing or online execution, True when training. I am however unsure if applying it to the last layer is correct, I first thought it would mess up the voting, but it doesn't seem to do so
` def buildModel(self):
numFilters1 = 6
filterSize1 = 3
numFilters2 = 10
filterSize2 = self._windowSize - (filterSize1-1)
self._training = tf.placeholder(tf.bool)
self._X = tf.placeholder("float", [None, self._numCoins, self._windowSize, 3], name="X")
input_dims = tf.shape(self._X)
self._prevW = tf.placeholder(tf.float32, shape=[None, self._numCoins])
conv1 = self.conv_layer(self._X, [1, filterSize1, 3, numFilters1], self._training, name="conv_1")
conv2 = self.conv_layer(conv1, [1, filterSize2, numFilters1, numFilters2], self._training, name="conv_2")
w_prev_reshaped = tf.reshape(self._prevW, [-1, self._numCoins, 1, 1])
w_concat = tf.concat([conv2, w_prev_reshaped], axis=3) ## concat w in de depth direction (axis 3), cfr fig 2 paper
conv3 = self.conv_layer(w_concat, [1,1,numFilters2+1,1], self._training, name="conv_3") ## numFilters2 + 1 since we appended w_prev
flatten = conv3[:, :, 0, 0]
self._cashBias = tf.get_variable("cash_bias", [1, 1], dtype=tf.float32, initializer=tf.zeros_initializer)
tiled_cash_bias = tf.tile(self._cashBias, [input_dims[0], 1])
voting = tf.concat([tiled_cash_bias, flatten], 1)
self._softmax = tf.nn.softmax(voting)
def conv_layer(self, x, weight_shape, training, name="convolution"): ##filter_size(B x H x D) x num_filters
with tf.variable_scope(name):
insize = np.prod(weight_shape)
W = tf.get_variable("W", weight_shape, initializer=tf.random_normal_initializer(stddev=(2.0 / insize) ** 0.5))
b = tf.get_variable("b", weight_shape[-1], initializer=tf.constant_initializer(value=0))
conv = tf.nn.conv2d(x, W, [1, 1, 1, 1], padding='VALID')
logits = tf.nn.bias_add(conv,b)
act = tf.nn.relu( self.conv_batch_norm( logits, weight_shape[3], training) )
##act = tf.nn.relu( logits )
return act
def conv_batch_norm(self, x, n_out, phase_train):
beta_init = tf.constant_initializer(value=0.0,
dtype=tf.float32)
gamma_init = tf.constant_initializer(value=1.0,
dtype=tf.float32)
beta = tf.get_variable("beta", [n_out],
initializer=beta_init)
gamma = tf.get_variable("gamma", [n_out],
initializer=gamma_init)
batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2],
name='moments')
ema = tf.train.ExponentialMovingAverage(decay=0.9)
ema_apply_op = ema.apply([batch_mean, batch_var])
ema_mean, ema_var = ema.average(batch_mean), ema.average(batch_var)
def mean_var_with_update():
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(phase_train, mean_var_with_update, lambda: (ema_mean, ema_var))
normed = tf.nn.batch_norm_with_global_normalization(x, mean, var, beta, gamma, 1e-3, True )
return normed
`
Edit: Didn't refresh page. Thanks @joostbr , what you show is very similar to what I found on the ridiculousness that is batch normalization in tensorflow.
@lytkarinskiy What a rabbit hole batch normalization is in tensorflow... Did you construct two graphs, one for train and one for test?
I think one thing to keep in mind is that if you're utilizing crypto data, there is an extra trade involved during a balance. Unless, of course, the exchange supports the tether pair(s) (USDT) that are targeted for the re-balance. Thus, the commission rate is not accurately represented during training or testing.
@schmidtj yes man, two graphs with placeholder training = tf.placeholder_with_default(False, shape=(), name='training')
set to True during Training and to False during Test.
Also during Train we have to insert extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
to tensor list
I implemented BN using tf.layers.batch_normalization
as suggested by @lytkarinskiy -- but I don't creating two graphs isn't necessary. I believe TFLearn will set all trainable parameters in a graph to False
when tflearn.is_training(False, self.session)
is called at line 209
of nnagent.py
module.
Unfortunately, adding BN after all convolutional layers (even on the last one before the softmax) doesn't appear to solve the vanishing gradients on the cash bias. Looking at the programlog
, BTC bias is zero all the time as before. @lytkarinskiy @joostbr can you confirm that you were able to get significant non-zero cash bias values during backtesting? Perhaps can you attach a log file?
@schmidtj I believe the program accounts for the dual transaction cost during rebalancing in the transaction remainder factor mu. But of course does not capture USD -> BTC -> USD if that is what you meant.
@dlacombejr I started doing my own coding before public code release, so I can't give you the official log. But I can attach img which I use as log =) First image row is portfolio values for trained model with 0,25% fee and same model applied zero fee Second row image is relative price graph Third row image is a) w and b) is mean w for every coin during period
Model was trained just for 2000 steps and data is from Kraken exchange.
@dlacombejr Thank goodness for TFLearn. I'm testing now, but reading the documentation, I believe you are correct that two graphs are not required when using TFLearn instead of vanilla tensorflow. Cheers.
And, yes that's what I mean, the additional fee when going from USDT -> BTC -> ALT and vice-versa.
@lytkarinskiy thanks for sharing that! I can see that the BTC cash bias is definitely not zero in your omega history. Do you have any suggestions as to why I'm not able to replicate those non-zero cash bias values even when using BN?
I am also considering other ways to get a cash vote instead of using a bias. One idea I'm thinking of implementing is actually getting the BTC cash vote from reversed_USDT
and just dropping the cash bias. In this way, there will just be votes from the convolutional layers and no bias added to it. Any thoughts on whether this may or may not be successful?
@dlacombejr do you have any other regularization active? like L2 or dropouts? I would kill those first
I dropped all regularization, same issue. Thanks for the suggestion @joostbr
@dlacombejr did you pass extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
to tensor list while training ?
@dlacombejr About the growing voting value. It seems that it only happens on the test set:
This shows that the agent thinks the test data is more promising than the training data.
I guess it's related to the padding value in training data. There is some data missing because the history of some assets is not long enough. These Nans will be padded as the first price when the trading pair started. As the agent cannot earn money from these faked assets, it could learn to output a lower voting when the input prices are unvaried, and higher voting on those assets whose prices keep fluctuating. Since all the assets on test set have real prices, they will then get a higher voting.
There could be other reasons for the growing voting. We can do more experiments (e.g. on different time ranges) to figure out the reasons.
@dlacombejr Unfortunately, I'm not sure what is happening with your network. I have tested BN after each conv layer (w/ and w/o regularization) and using BN after first conv layer, and in each case I get non-zero cash bias output, sometimes near 100% cash bias. The performance gains thus far have been negligible. With regard to @ZhengyaoJiang above, I do have my volume_average_days set to 120, and have altered the code to use average daily volume instead of top volume. I did this so that I wouldn't be stuck with a large portion of my training set contain padded NaN's.
@lytkarinskiy because I'm still using TFLearn, I think it automatically adds all trainable parameters. In the TensorBoard, I can see that BN layers are connected to gradients, there are gradients for them, and their parameters are updating.
@ZhengyaoJiang I noticed that as well. It's not possible to tell if something similar is happening on the training data because the scale is just so much larger on that graph and all values within the entire range on the test graph are being binned into one spike on the test graph. Definitely requires some more experimenting to figure out what is going on.
Still no luck on getting non-zero cash bias values during backtesting. Any thoughts on this?:
I am also considering other ways to get a cash vote instead of using a bias. One idea I'm thinking of implementing is actually getting the BTC cash vote from reversed_USDT and just dropping the cash bias. In this way, there will just be votes from the convolutional layers and no bias added to it.
Update: @schmidtj Surprised to hear that the performance gains have been negligible. Even for the time frame during the beginning of this year's crash? I'll try updating my code to support average daily volume instead of top volume to see if that affects it, thanks for the suggestion.
@dlacombejr can you post a code snippet where the batch norm is being applied?
@joostbr here is the "layers" portion of my configuration file (I've tried with and without regularization):
"layers":
[
{"filter_shape": [1, 2], "filter_number": 3, "type": "ConvLayer", "activation_function": "linear"},
{"type": "BatchNormalization"},
{"type": "ReLU"},
{"filter_number":10, "type": "EIIE_Dense", "regularizer": "L2", "weight_decay": 5e-9, "activation_function": "linear"},
{"type": "BatchNormalization"},
{"type": "ReLU"},
{"type": "EIIE_Output_WithW_WithBN","regularizer": "L2", "weight_decay": 5e-8}
]
And here is the code that I've added to network.py
to support these layers:
elif layer["type"] == "BatchNormalization":
network = tf.layers.batch_normalization(network, axis=-1)
elif layer["type"] == "ReLU":
network = tflearn.activations.relu(network)
elif layer["type"] == "EIIE_Output_WithW_WithBN":
width = network.get_shape()[2]
height = network.get_shape()[1]
features = network.get_shape()[3]
network = tf.reshape(network, [self.input_num, int(height), 1, int(width*features)])
w = tf.reshape(self.previous_w, [-1, int(height), 1, 1])
network = tf.concat([network, w], axis=3)
network = tflearn.layers.conv_2d(network, 1, [1, 1], padding="valid",
regularizer=layer["regularizer"],
weight_decay=layer["weight_decay"])
self.add_layer_to_dict(layer["type"], network)
network = tf.layers.batch_normalization(network, axis=-1)
network = network[:, :, 0, 0]
btc_bias = tf.get_variable("btc_bias", [1, 1], dtype=tf.float32,
initializer=tf.zeros_initializer)
btc_bias = tf.tile(btc_bias, [self.input_num, 1])
network = tf.concat([btc_bias, network], 1)
self.voting = network
self.add_layer_to_dict('voting', network, weights=False)
network = tflearn.layers.core.activation(network, activation="softmax")
self.add_layer_to_dict('softmax_layer', network, weights=False)
@schmidtj would you mind posting the code for:?
to use average daily volume instead of top volume
Edit: After looking over the code and looking at your changes, it seems, at least to my limited knowledge of the details of BN (skimmed the paper), you are using the more correct version. Whereby, you perform BN before the activation. I am in fact not using it correctly, i.e., I'm setting relu in conv2d, based on the original implementation of this code, and performing BN after the conv layer. I'm not sure what the implications are mathematically between our versions, but I can confirm that I am getting non-zero cash bias, possibly by accident... :-P
@dlacombejr I'm essentially doing the same as you (without the more correct integration with the config). One thing I'm not doing is explicitly setting of the axis. It should be noted that I am experiencing wild swings in my loss values during training. I'm trying to manage this with the delay parameter, but I have yet to get complete control of the issue.
Per the above:
def __get_total_volume(self, pair, global_end, days, forward, avg=True):
start = global_end-(DAY*days)-forward
end = global_end-forward
chart = self.get_chart_until_success(pair=pair, period=DAY, start=start, end=end)
result = 0
for one_day in chart:
if pair.startswith("BTC_"):
if self.market == "poloniex":
result += one_day['volume']
else:
result += one_day['quoteVolume']
else:
if self.market == "poloniex":
result += one_day["quoteVolume"]
else:
result += one_day["volume"]
if avg and result != 0:
result = result / days
return result
You can ignore the market check, there seems to be an issue with what quote means between poloniex and the exchange that I have added. Pretty straight forward change, no awards will be given.
Still no luck on getting non-zero cash bias values during backtesting. Any thoughts on this?
So as I claimed before, I think that the agent doesn't reserve cash because it can increase the expected total portfolio value. One might think the agent should reserve cash in the bear, while the current agent actually cannot recognize a bear market, and there is even no input to the cash subagent.
So as I claimed before, I think that the agent doesn't reserve cash because it can increase the expected total portfolio value.
Ok, so that makes sense. Even if there is a bear market, it must be finding a way to still increase expected return. I still wonder why this is not the case for @lytkarinskiy ...
@dlacombejr Not sure if it's relevant but you are using:
tf.layers.batch_normalization(network, axis=-1)
instead of:
tflearn.layers.batch_normalization(network)
Which does not take axis as a parameter. I hadn't noticed your use of vanilla tensorflow until I tried to pass axis as a parameter, which the tflearn version does not take and throws an error.
If you do make this change, you will need to add:
tflearn.config.init_training_mode()
To the beginning of build_network or else it will fail when loading the network during the test phase.
it should be noted that I am experiencing wild swings in my loss values during training.
@schmidtj I think ive been having a similiar issue. After adding in batch normalization the loss seems to almost oscillate downwards, the only way to fix it ive found is to remove the batch normalization. Have you managed to find a solution to this?
Green = Batch Norm Blue = Without Batch Norm
@NMagnus I did find a workable solution. You need to play around with the decay parameter, I chose 0.9999 and it seemed to fix this issue. The following link is an interesting read, but the relevant post is by zhimengfan1990 near the bottom, post starts with the apt: MAYBE YOU NEED READ THIS text. https://github.com/tensorflow/tensorflow/issues/1122
Take note of the calculation for convergence using different decay values, you might need to alter yours to converge.
@schmidtj Cheers, thats exactly what i needed.
@schmidtj @NMagnus I understood using Batch normalization with right decay enabled you to train an agent which doesn't never hold BTC. May I please ask if that resulted on an agent that just holds a tiny bit of BTC sometimes, or did that result in an agent that intelligently changes its BTC holding behaviour during bull or bear market of altcoins? In other words, did this result in an enhancement?
@akaniklaus Just to clarify, we're talking about holding cash (in this case Tether which is tied to USD) and not BTC. The agent will hold BTC with and without batch normalization. I'm pretty sure that you meant cash (USDT), so I will proceed under that assumption.
With batch normalization, at least in my case, the agent did in fact hold 100% cash and held it for multiple consecutive trade events. The backtester does not accurately handle quantifying the performance of making such a trade though. It is highly likely that a number of possible positions in a portfolio do not have a Tether pair, thus you have to go from USDT -> BTC -> Altcoin, resulting in an additional trading fee, which is not captured. This does have another consequence; when you are holding USDT, all of the input data to the network has a quote currency of BTC, thus a trade made from USDT -> Altcoin, will result in an inaccurate amount of the underlying (base) asset.
So, batch normalization does result in holding cash, but the performance cannot be accurately captured with the current implementation and the value of performing such a action is (likely) undervalued. At least I believe this to be the case, others here might be more knowledgeable and be able to contribute further.
@schmidtj I mentioned BTC as cash because I am not using USDT in my configuration. In other words, my cash is BTC. I was curious about what is the change in cash holding behaviour after you applied batch normalization. Was that just a slight neglectable change, was that just some static cash holding behaviour on a certain portfolio percentage, or did the agent started to intelligently calibrate how much cash it is holding according to market conditions? Can you send a plot of portfolio weights? Btw, you may also switch your base cash-type to BTC if you would like to try out its effect on real performance.
@akaniklaus To be fair, I'm not "using" USDT either, internally it is using BTC. Many changes would have to be made in order to utilize tether, but it would be worth the effort.
With batch normalization the network fully allocates to cash (100% allocation) for multiple consecutive time steps. As for the "market condition" question, I don't have a good way currently to visualize this, but, since we are maximizing portfolio value (BTC) and if we settle in on a reasonable loss value, the cash bias should be switched to during market downturn.
The differences in performance are negligible with batch normalization (with the network correctly switching to cash) and without. In it's current form this makes total sense, since we have two items in our portfolio that represent BTC, i.e., the "reversed" asset (BTC) and now, with batch normalization, cash bias. The NN has no context to be able to differentiate between the two other than the inputs being different. If allocating to either of these assets results in the same portfolio value, the network will arbitrarily choose during a market downturn. With batch normalization it seems to choose cash bias and without chooses the "reversed" asset.
As for the portfolio weights, I took a trip to regular expression land and thought I had a nice way to parse the data so that I could visualize, but it still needs work. I'll make one more pass at this to see if I can give you a pretty graph.
@schmidtj
If you do make this change, you will need to add:
tflearn.config.init_training_mode()
I think this is not needed because in line 149 in nnagent.py, the code tflearn.is_training(True, self.__net.session)
has done the same thing.
@schmidtj I did not fully understand you because cash is BTC and reversed asset is the USD value in BTC. So, they shouldn't be the same. If it switches to cash only in certain times of market downturn for your case, it is a good news though. But I guess that you need to analyze it little bit more to make sure. @NMagnus do you also have any observation related with this issue after applying BN decay correctly? @lytkarinskiy From the screenshot you shared, it seems you have achieved the agent to hold various amount of BTC (unless he is using USD as cash), can you please tell us how you have achieved that?
@nian-liu That change is indeed necessary, since without it the code throws an exception during test phase. It wasn't a theoretical suggestion, it was the result of a bug fix that I had to implement.
What is the simplest way to remove cash bias or force it to be equal to 0? I tried to set
to
btc_bias = tf.get_variable("btc_bias", [1, 1], dtype=tf.float32,
initializer=tf.zeros_initializer, trainable=False)
while making sure that the net_config was set to EIIE_Output_WithW
.
The reason I want to do this is because I am testing with stock market data, and I want pgportfolio to allocate all the funds into the stocks, not keep cash.
EDIT: It seems like BTC omega is initialised to 1:
I'll try to set this to 0 to see what happens
EDIT2: It did not work, any suggestions @DexHunter, or any other?
@ZhengyaoJiang Do you have any idea how I can do this?
The cash bias in the network output (omega) always appears to be zero, even under conditions where it seems holding some cash would be better (i.e., bear markets, or when all traded markets are performing poorly). This appears loosely related to issues #17 and #29.
To diagnose the issue, I added the weights and gradients of the graph to TensorBoard. This revealed that the gradients for the cash bias (on the tile tensor just before concatenation with the other votes) fall very quickly to zero. Votes for the other assets continue to grow roughly monotonically over time, without stopping. So my interpretation is that the votes for the other assets grow too quickly and squash the cash bias, and its corresponding gradients for updating it.
As a sanity check, I trained the network on data between the first day of this calendar year and the present. This time frame has been characterized by poorly performing markets a significant amount of the time, and in many cases, the agent should just hold the cash. Again, under these circumstances, the model fails to learn this policy; the loss actually goes up, and again the cash bias remains zero due to squashed / vanishing gradients.
I'm guessing that this issue has not been addressed in the past mainly because we have been in bull markets for quite some time now and these circumstances don't lend to uncovering this bad policy.
I've attached the train package for the default net configuration where the vanishing gradients for the cash bias can clearly be seen in when investigating the TensorBoard files. I'll make a PR shortly to add these elements to the
__init_tensor_board
method of thetradertrainer.py
module.Can other people replicate this issue? Are there any existing or proposed solutions? In #29 it sounded like @DexHunter might have been working towards a solution to make this learnable. I'd be happy to work through finding a solution, even if it means a change to the network architecture. Perhaps simply stronger regularization has to be made on either the weights or the activations in the voting layer.
Thanks in advance.
9.zip