Closed dlemosmartins closed 1 year ago
Hi, @dlemosmartins. Thanks, I hope you are doing well!
Yes, it is possible. You just need to put your daily data into a data frame and pass it to the env using the code below:
x = 10
custom_env = gym.make('forex-v0',
df = my_daily_df,
window_size = x,
frame_bound = (x, len(my_daily_df)),
unit_side = 'right'
)
Afternoon, sir! So, I understood what happened, and adjusted it for the part that refers to Stocks.
x = 10 custom_env = gym.make ('stocks-v0', frame_bound = (x, len (my_daily_df)), window_size = x), and I was able to execute it.
Here comes 2 problems:
the first being mine, which is disk space for me to be able to put to process approximately 150 million lines (from csv to dataframe), with about 300 inputs, in addition to the closing price. I have no memory on the PC! lol
Now, for the second, being for stocks, and putting it as the idea of starting at a time, for example: at 9:30 am and ending at 17:30 in the afternoon ... every single day.
I, in this case of the project, would have to create a layer beyond, to validate the days - beginning and end - and then yes, put the RESET (self) method of the file 'trading_env.py'?
You can override some methods to reach your goal. But, it takes much time and can be hard. I was thinking about breaking the whole period into daily periods and considering each period as an episode.
I mean you don't need to load all 150 million lines. Just one day's data is enough. Then, you can select a period, for example [9:30, 17:30], and put it in the env
. Now, train your model with this env and consider it as an episode. Repeat this operation for each day and use all days' data to improve your model.
It seems like both problems are solved!
I think so, it goes in the vein of my doubt!
Thank you very much for your help and your attention, @AminHP !
I think we'll talk soon!
Hello, my dear! Look at me here again. =)
Continuing with the conversation, I managed to get my import code, changed it to be present in the data extraction part and then yes, I did my CSV. Explaining a little more about this, we have a first point where the hour is whole and goes from 9 to 17, so that I could thus use the form that I had commented on. Analyzing this, I got this return:
_"info: {'total_reward': 1590.0, 'totalprofit': 1.0134883102463903, 'position': 0} "
Then, analyzing the points in the graph, I saw that in some moments it opens more than one sale or more than one purchase and in this case, for us who try to see the intraday part, it would be more interesting to enter only once, until the exit , from the first entry, for example.
Made a purchase, just sells. Made a sale, just buy. That, would be configured in the STEP method, right? And if so, is there any way that you have already glimpsed this point of being just 1 input operation and 1 output operation, for this code?
See you soon!
Hi, again. I hope you are doing fine!
About the second picture, the actual trade only happens when the position changes. So, having like 100 buy actions in a row doesn't make 100 trades.
About your final question, you can change the reward function in a way that gives a penalty for short-time trades and make the agent trade once in a while.
I was thinking about breaking the whole period into daily periods and considering each period as an episode.
Hello @AminHP, I think this a great idea for day / inter-day trade, I try do this, but the result did not convergence since many loop. I was not good for coding. Did I do something wrong?
env_maker = lambda: gym.make('forex-v0',df=FOREX_EURUSD_1H_ASK, frame_bound=(1000, 1048), window_size=8)
env = DummyVecEnv([env_maker])
model = PPO2("MlpPolicy", env, verbose=0)
model.learn(total_timesteps=2000)
env = env_maker()
for episode in range(5000):
observation = env.reset()
while True:
observation = observation[np.newaxis, ...]
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
# env.render()
if done:
print(episode, "_info:", info)
break
Sorry the layout, can't post long code in there.
Hi @0trade.
It seems like you are training your model with 48 hours of data. Then, test it 5000 times on the same env. I think this is wrong. You should try to train your agent for 5000 different episodes, then test it on another env.
Yes, you'r right. Repeat training on same time period that will be overfit, The model can not generalized.
But in first I want see agent make profit stability ( even overfit), and then I will continue work on different days env. Unfortunately, I try increase by time period or cycle number, no matter what the result is still not be convergence.
@0trade , @AminHP
And for a DQN agent, because I'm having a hard time and I'm not going. Do you have any ideas or implementation that I can follow? Even for the part of the start and end time, from 9 to 17, I still haven't managed. I'm lost
i have this code: ` import gym import gym_anytrading from gym_anytrading.envs import TradingEnv, StocksEnv, Actions, Positions from gym_anytrading.datasets import STOCKS_GOOGL import matplotlib.pyplot as plt import numpy as np from collections import deque from keras.models import Sequential from keras.layers import Dense from keras.optimizers import Adam from keras import backend as K import random import tensorflow as tf
class DQNAgent: def init(self, state_size, action_size,shape): self.state_size = state_size self.action_size = action_size self.memory = deque(maxlen=2000) self.gamma = 0.95 # discount rate self.epsilon = 1.0 # exploration rate self.epsilon_min = 0.01 self.epsilon_decay = 0.99 self.learning_rate = 0.001 self._shape = shape self.model = self._build_model() self.target_model = self._build_model() self.update_target_model()
def _huber_loss(self, y_true, y_pred, clip_delta=1.0):
error = y_true - y_pred
cond = K.abs(error) <= clip_delta
squared_loss = 0.5 * K.square(error)
quadratic_loss = 0.5 * \
K.square(clip_delta) + clip_delta * (K.abs(error) - clip_delta)
return K.mean(tf.where(cond, squared_loss, quadratic_loss))
def _build_model(self):
model = Sequential()
model.add(Dense(24, input_shape=(1,self.state_size), activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss=self._huber_loss,
optimizer=Adam(lr=self.learning_rate))
print(model.summary())
return model
def update_target_model(self):
self.target_model.set_weights(self.model.get_weights())
def memorize(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
act_values = self.model.predict(state)
return np.argmax(act_values[0])
def replay(self, batch_size):
minibatch = random.sample(self.memory, batch_size)
for state, action, reward, next_state, done in minibatch:
state = np.reshape(state, [1, state_size])
target = self.model.predict(state)
if done:
target[0][action] = reward
else:
next_state = np.reshape(next_state, [1, state_size])
t = self.target_model.predict(next_state)[0]
target[0][action] = reward + self.gamma * np.amax(t)
self.model.fit(state, target, epochs=1, verbose=0)
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
def load(self, name):
self.model.load_weights(name)
def save(self, name):
self.model.save_weights(name)
env = gym.make('stocks-v0', frame_bound=(9, len(STOCKS_GOOGL)), window_size=1) EPISODES = 4000
state_size = env.observation_space.shape[1] action_size = env.action_space.n agent = DQNAgent(state_size, action_size,env.shape)
done = False batch_size = 32
for e in range(EPISODES): state = env.reset() state= np.reshape(state, [env.window_size, env.shape[1]]) for time in range(500):
#action = agent.act(state)
action = env.action_space.sample()
next_state, reward, done, _ = env.step(action)
reward = reward if not done else -10
next_state = np.reshape(next_state, [env.window_size, env.shape[1]])
agent.memorize(state, action, reward, next_state, done)
state = next_state
if done:
print("info:", _)
break
if len(agent.memory) > batch_size:
agent.replay(batch_size)
env.render()
`
and when I run this code, I get this feedback ...
Model was constructed with shape (None, 1, 173) for input Tensor("dense_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). Model: "sequential"
`Layer (type) Output Shape Param
dense (Dense) (None, 1, 24) 4176
dense_1 (Dense) (None, 1, 24) 600
dense_2 (Dense) (None, 1, 2) 50
Total params: 4,826 Trainable params: 4,826 Non-trainable params: 0`
`Model: "sequential_1" Layer (type) Output Shape Param
dense_3 (Dense) (None, 1, 24) 4176
dense_4 (Dense) (None, 1, 24) 600
dense_5 (Dense) (None, 1, 2) 50
Total params: 4,826 Trainable params: 4,826 Non-trainable params: 0`
info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} W0803 23:59:08.353774 8688 functional.py:587] Model was constructed with shape (None, 1, 173) for input Tensor("dense_3_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). W0803 23:59:08.435745 8688 functional.py:587] Model was constructed with shape (None, 1, 173) for input Tensor("dense_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). W0803 23:59:08.688743 8688 functional.py:587] Model was constructed with shape (None, 1, 173) for input Tensor("dense_input:0", shape=(None, 1, 173), dtype=float32), but it was called on an input with incompatible shape (None, 173). info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 0.9844628868832217, 'position': 1} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 0.9846724897802759, 'position': 1} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0} info: {'total_reward': 0.0, 'total_profit': 1.0, 'position': 0}
Hi, @dlemosmartins
Before going to write everything from scratch, I suggest using some RL libraries like stable_baselines
. I have already put an example here in the new release of gym-anytrading
that might be useful.
@0trade you can also use this example in order to train and test on the same env to overfit.
@AminHP Hello Mr. all right? So ... I tried to install it on my notebook here but it didn’t happen, but everything is fine. I managed to do with DQN his part of making trade and everything, with some rules of Buy, Sell and Holding, but my problem now is trying to generate it in LSTM to use my video card. It is only using 10% capacity, with dense layers only. With lstm I'm still trying to understand why I can't reshape the correct entry. Check the log:
For my model, I left it like this:
For the replay layer, I left it like this:
it only goes fast, with only 1 epoch, but the result was tense ...
but now, my challenge is to try to do with lstm, or cudnnlstm - in this case to use the GPU.
For the LSTM network, you don't need to reshape the state. Pass it to the model without reshaping.
But when he comes in to do the replay, he gives an error: expected ndim = 3 found ndim = 2
It was where I was even scolded by the wife (lol) for staying until dawn - sometimes - trying to get caught with lstm.
Then I kind of gave up, at that moment and started to see only with dense layers.
I know that the "code works" now.
The input_shape for the LSTM network must be something like this: (batch_size, window, n_features) So, if you want to pass only one sample to the network, you should use the code below:
action = model.predict(np.array([state]))[0]
aoooooo now yes, sir! thanks for your help ... LSTM (actually using "tf.compat.v1.keras.layers.CuDNNLSTM") now worked! uhuuuuuu
but, the GPU utilization, nothing above 10%. I face it! that's what I took to configure, install all the drivers update everything just right, but it didn't work!
but the best that LSTM (CuDNNLSTM) is active! Let's see how AOT_NN will do!
5 hours to run the first episode. my Lord
In the replay
method, you can pass the whole minibatch to the model.predict
or model.fit
methods, not just a single state.
following the suggestion ... Let's see how this "pseudo brain does now"
I just want to see then do the reverse communication with the system that will make the trade, then test the NN in the backtest and then, demo account and then, who knows, one day, in production!
and tell you that it's in the same slowness ... until video card update, drivers, tensorflow ... everything ... now from 10% fo to 5% lol
Even for the part of the start and end time, from 9 to 17, I still haven't managed.
@dlemosmartins The train by time slot are bothering me too, I don't know can help on this by blow url: https://stackoverflow.com/questions/45141079/pandas-read-csv-dataframe-rows-from-specific-date-and-time-range
If you have new discovery, please let me know.
@AminHP AminHP
you can also use this example in order to train and test on the same env to overfit.
Maybe my nested loop code was wrong, I try use example "while" loop and large "total_timesteps" wish can have convergent. Thank you very much.
@0trade And there, my dear. Thank you for the link. It was more or less what I did. I took and transformed the dates to confirm that they were occurring within the same date period. If there were different days, I finish the trade and then I place it to start another sequence. But not completely ending the while. In my case, the donkey here, left the DF as Day | Hour | and indicators that I use ... but for the DF index, I left the time there chipped everything in the logic that I wanted to have done, but the outline of the solution - that I did - seems that "it worked". I adjusted the step_rewards part, for operations inputs and outputs. As well as, I also added in addition to the holding position, I added for him not to always open operations, in this case, using holding too. But what chips me the most here is processing time, my lord of mercy. lol
@AminHP @0trade Gentlemen, I was thinking about coffee and a cigarette ... and the network memory? How will we do it if we put it into production? Should all memory that was inserted during training, be available in production or would "save_model" already do that for us? If not, Redis, MongoDB or any other database with easy access, configuration and speed in bringing the data, would it be an option?
@dlemosmartins
Glad to see your code worked, I just a trader make bad code so I can't tell you about quant things, but I prefer AminHP suggest: stable_baselines
You can easily save or exporting models.
@0trade so we are in the same ... bad code is with myself kkkkkk
yes, i was also happy when it ran the first time, but even in the time it takes, it makes me discouraged. Now I'm thinking of something to start putting on stop loss and maybe even a trailing stop, but it still didn't come to my mind how to do that.
the rule, in the system i use, mt5, calm to put this but here, i don’t know how i could do for a number prediction and i even think about having more than one agent for this. But I don't know yet ...
do you have any suggestions, @AminHP @0trade ?
and for baseline, I'm going to try to get it on my pc ... see if I can install it, but with anaconda, it wasn't working very well.
dlemosmartins,
[Stable Baselines](https://github.com/hill-a/stable-baselines)
is work fine in anaconda, I use anaconda too.
For live trade I think oanda is you first choice, they has a rest API can hold your trading. Perhaps Backtrader
is another way to live trading.
here in Brazil, I can't use oanda. unfortunately there is no such integration here. I face so much technology here and they are stuck only in one company to release the stock market signal. I will see this link from the stable-baselines
Why don't you try to use pip
instead of anaconda
?
Man, I tried ... but it was giving me a lot of trouble ... after I read a lot, I saw that you can install the PIP inside the anaconda and use it there ... now it's "another life"
but, in addition, i started from scratch in python ... and went through my studies ... where anaconda makes our life easier, with packages. I also liked the spyder, to program. Then I already use vscode, I'm already in it to do everything in my life at work ... now I really see that anaconda sometimes locks me in some things. I even think about dismissing him ... but, let's see how long I can take it. kkk
you do not know the way I take to get the data minute by minute, from the financial market here in Brazil, from 2015 to today ... And I still have the challenge of integrating the part of the neural network that we are engaged in, to make the prediction and control in MetaTrader5. I'm still on top of the iceberg
After 2e7 round of training, the result no longer random, final get convergent, My agent learn first strategy: buy and hold, one round one order, never do anything else. Next step I think about change the reward and see what happens.
@AminHP Sir, Each period example [9:30, 17:30] as an episode things do you have any idea? I can not imagine how put it in the train env.
@0trade just buy and hold? buy - hold ... holdxN - sell or the other way around, couldn't you?
I tried to put it for a2c, but it wasn't. The tensorflow I'm using here, 2.2 or 2.3, I don't remember, it doesn't work because of the "tensorflow.contrib" module there would have to download the version and such!
What countries are you in? @AminHP @0trade
@0trade, are you using GPU for learning?
just buy and hold?
hah hah, buy / sell and hold, I dont know long or short order, this not Important, the point is the result no longer random. The explanation is that agent has a strategy (even bad strategy), at least I think so. I can improve it later.
I use tensorflow-gpu
in anaconda, you can create a new virtual environment in anaconda. My install:
conda create -n tf python=3.6 -y
conda activate tf
conda install tensorflow-gpu=1.15 -y
conda install -c conda-forge gym -y
conda install -c conda-forge mpi4py -y
pip install stable-baselines[mpi]
pip install gym-anytrading
kkkk I also don't have a good strategy, I just played about 40 indicators that I use personally, plus some that I "saw patterns" and passed the data to him to turn around and learn. I will not use long / short here either.
Thanks for the setup config. I'll do this in a few minutes, to see if I can run this a2c and see if I can improve the performance. I am also trying to implement ddqn, let's see if there is any gain in the results. For another point, the strategy, do you put any indicator? do you calculate it? have you seen the library: TA-lib? seems interesting for the production of indicators.
Oh, and you, are you using lstm or conv1d? i wanted to do this conv1d, but i still have to have more knowledge of lstm to see if i can get something! kkk
You can override some methods to reach your goal. But, it takes much time and can be hard. I was thinking about breaking the whole period into daily periods and considering each period as an episode.
I mean you don't need to load all 150 million lines. Just one day's data is enough. Then, you can select a period, for example [9:30, 17:30], and put it in the
env
. Now, train your model with this env and consider it as an episode. Repeat this operation for each day and use all days' data to improve your model.It seems like both problems are solved!
In my case, I prefer to have a fixed number of steps given by the size of the train set (which can be of daily, minutely, monthly and others frequencies) and then vary the number of episodes.
So I have to keep calculating the total_timesteps
value when I am using 'stable_baselines' method model.learn()
to get the exact number of episodes I want.
@dlemosmartins I tried used indicators as signal_features
but look like profits have not improved significantly so i decide start it by simple. And yes, I used PPO2 + MlpLstmPolicy.
@xicocaio Hello xicocaio, I don't know total_timesteps
can be exact number, I just model.learn()
around and test, if result is not convergence the continue around model.learn()
, just go round and round. pretty stupid hah?
If you used fixed number how do you holding time period not changed?
Hi @0trade,
I am doing experiments for my academic research, not live trading. That's why I have to be very precise about the number of steps and episodes given that I have to report them on my thesis.
I did not get your question about holding time. I added holding to the action set, but, as reported on this repo documentation, after trained, the agent rarely adopts a neutral position, it usually keeps a long or short position.
@xicocaio I wish you a prosperity in study.
I wander how would you define total_timesteps
"enough" or "right"?
@0trade Thank you!
I do something like this
desired_total_episodes = 100
n_points = train_df.shape[0]) # get the number of data points
total_timesteps = desired_total_episodes * n_points
model.learn(total_timesteps=total_timesteps)
Which is not a perfect approach as reported in this answer on StackOverflow that says
Where the episode length is known, set it to the desired number of episode you would like to train. However, it might be less because the agent might not (probably wont) reach max steps every time.
For this reason, I actually just added a request in the Stable Baselines3 repo for the introduction of a total_episodes
parameter in the model.learn()
method.
@xicocaio Thanks for your explain and url.
I saw a word in the OpenAI Baselines: total_timesteps: int number of timesteps (i.e. number of actions taken in the environment)
In this case this probably means total_timesteps = (frame_bound) end_index - start_index
something wrong, right?
Hey, gentlemen. My trainings is very bad.. There comes a point, where the prediction goes only one way. A position always making rules and rules, in case the financial result is negative, the "cerebellum" does not change its position. Does DQN have this problem? A2C would be what "saved" me to have a possible improvement in the results?
@0trade I tried to put this setup on my machine ... but it went bad ... then I decided to change my rule and trying to modify some things of the batch replay and other points. But, I don't know ... maybe I will have to concentrate and try to put this a2c on to see how it would go in training.
I was discouraged now. = /
@0trade As I understand, It means that if you want to train your agent by making it observe the data from start_index = 50
through end_index = 100
, for 100 times, you would have to pass total_timesteps = 5000
((end_index - start_index) * 100).
Sorry, was a long week.
@dlemosmartins In general, I've always have bad results, and I've gotten used to it. If once you have a “good results” maybe this is your "holy grail", So this must be through long and tortuous process. Don't discouraged too early. ps. why not try PPO2, that was fast then A2C.
@xicocaio I think you are right, I ignored times. Much appreciation to your answers.
@0trade, don't even mention me in a busy week, mine only took 18 hours in a row, every day and I managed to rewrite for DQN dueling yesterday. Did it work? no ... after 3 days, in my dataset, it goes back to bad ... the a2c I was not able to install on my machine ... there is a lot of error, even with anaconda, the way it gave me its config. but I think that, it was a little bad because I put too much indicator and it ends up confusing.
I'll try again, put your config and let's see what happens.
I was thinking about using Conv1d with lstm, they tried something like this, @xicocaio @AminHP @0trade ?
Hi, @dlemosmartins,
I am using stable-baselines3 and it does not support lstm yet, so I have not tried any version of lstms networks. =/
Still, is combining Conv1d and lstms something that people are trying overall? Wouldn't it be more appropriate to use lstms with attention layers?
Hello @xicocaio @AminHP @0trade, Well, I ended up deviating my focus to try to implement a2c with tensorflow, since in another attempt of mine to put stablebaselines3 on my laptop it didn't work. But, let's see if one day I still can.
I was seeing, mrs. that all my test cases he only enters as a Buy, and rarely as a sell ... even though he has a very negative reward for buy. Do you know what I could do differently for this not to happen?
Currently I changed the development to DDQN, but this thing is not going to ... I'm getting away from it.
The code I'm using, will be the prints below:
I know this is not practical, but do you see what I may be doing wrong?
Sorry for the fuss about it, but you are the only people I have contact with about it.
Hi .
Hello @AminHP, how are you? I've been studying for that time and searching more and more about this world of trade and more and more, going to daytrade (intraday). In this case, for this project, would it be possible for us to use it for training between a specific time during the day? I have the data, but I'm not sure how to use it to start these trainings and see how it could be useful for our world! Do you have any suggestions on how to do it, to use this data during a period of the day or even an example, like a light for my head, on how to use this project and learn more? Thank you, sir!