About sequence data in RL Algorithm

That's really a philosophy problem.

Does agent learn nothing? At first the training & test data is from 2012-01-01 to 2018-01-01, the Policy-Based method could converge, the Value-Based method could hardly converge, it really depends on weights initialization and lucky, and episode, if the episode is too big, it appears that agent learnt nothing. But small episode may indicate loss has not converged.
Can I use sequence data in RL? For now we cannot, the network of RL is very naive and simple even without RNN. I will add this feature later.
RL is too slow. I also noticed this problem and have used profile in Pycharm to check, there is a method in market class _origin_data frequently called to get original price info from a panda dataFrame, if the portfolio has more than 2 securities, it dose cost time, I've not find a good way to solve that now.
How many episode setting to get a result? For Policy-Based, sometimes 200, sometime 500, for Value-Based, sometimes 50, sometimes 100? It's a philosophy problem : ).

Actually I really got some new points for this project, you could also add my WeChat 610261753 to discuss.

Ceruleanacg / Personae