Closed JS00000 closed 6 years ago
That's really a philosophy problem.
Does agent learn nothing? At first the training & test data is from 2012-01-01 to 2018-01-01, the Policy-Based method could converge, the Value-Based method could hardly converge, it really depends on weights initialization and lucky, and episode, if the episode is too big, it appears that agent learnt nothing. But small episode may indicate loss has not converged.
Can I use sequence data in RL? For now we cannot, the network of RL is very naive and simple even without RNN. I will add this feature later.
RL is too slow.
I also noticed this problem and have used profile in Pycharm to check, there is a method in market
class _origin_data
frequently called to get original price info from a panda dataFrame, if the portfolio has more than 2 securities, it dose cost time, I've not find a good way to solve that now.
How many episode setting to get a result? For Policy-Based, sometimes 200, sometime 500, for Value-Based, sometimes 50, sometimes 100? It's a philosophy problem : ).
Actually I really got some new points for this project, you could also add my WeChat 610261753 to discuss.
When I run DoubleDQN or DuelingDQN, the result is nothing(his_profits is always 0). It seems that the algorithm have learned nothing. I think the input of the RL algorithm have some problems. The input data is just one day's stock data plus the agent's status, but I think one day's data cannot do any effective prediction. So can I use sequence data in RL Algorithm? Another question is when I run RL algorithm, it is too slow. Almost 15 seconds 1 episode. What is the main cost? And how many episodes should I set in total to get a good result?