Open Kismuz opened 5 years ago
@Kismuz Long time no see your update :-) thanks for the information. If J E Moody & Company LLC succeeded 15+ years ago with applying RL in trading in practice and turned it into business as a small team. Without nowadays computing power, software stack and recent advances in ML/DL field, which part do you think is most likely the key to their success from algorithm's aspect? thanks
@mysl , there are many match and CS experts and even more die-hard traders and asset-managers. The mix of both expertise in one head or team is still rare. One needed courage and independent vision to propose model-free policy search while living in the era of linear regression-based econometric models.
from algorithm's aspect?
- differentiable utility function is number one, proper feature extraction is number two IMHO.
From 'Multi Objective Reward Discussion' #110:
I have been recently thinking on how to incorporate risk adjusted returns, like sharpe ratio, as a way to form a richer and more complex reward function.
Differential risk adjusted measurement is just a brilliant concept! I went over the derivation of 'The Differential Sharpe Ratio' and 'Differential Downside Deviation Ratio' and they are both quite interesting, although I probably will need to read them a couple more times to really get them.
In principle this derivation can be applied to the 'N'th Lower Partial Moment' and get a whole family of those differential risk adjusted measurement.
@Kismuz, you did a real detective job for this post :) It really seems like they were on the right path since 20+ years ago
While looking for related articles in google, I came across this nice survey on the topic. Reinforcement Learning in Financial Markets
@Kismuz, differential risk adjusted are based on having some moving average statistic of the previous rewards. So it means that for the first stage (moving average initialization period), we don't get any risk adjusted reward feedback to learn from.
Is it a problem from RL stand point? or is it ok to have a very sparse reward at the beginning and then dense rewards from that point onward? (especially in light of that the sparse reward are due to an arbitrary number of actions taken and not environment dynamics)
I’ve been repeatedly asked this question and my answer always was something like ‘I have no evidence’. Unfortunately, in this domain every person who able to say something valuable instantly turns covert, vague and mysterious when it comes to real application.
But after all I think we can now track at least one case of successful application of RL in asset management. What is incredible about it is that we talk about two decades-old research work.
I consider myself decent information retriever so it is absolute shame I’ve missed this thread until now. If I wouldn't - my work with BTGym would pace times faster. Though I have independently repeated some findings (like time-series preprocessing via differencing stack of moving averages or recurrent policies), other key features didn’t came so easily. I mainly mean performance functions like Differential Sharpe or Downside Deviation Ratio found in these papers (I’m absolutely sure it is my ignorance of domain specific performance functions and attempts to use only linear combinations of returns as source of reward is main cause of suboptimal performance of algorithms included in BTGym).
So here is a stack of accessible works of John Moody, Matthew Saffell et all: https://www.researchgate.net/scientific-contributions/31497186_Matthew_Saffell https://www.researchgate.net/scientific-contributions/10597646_John_Moody
Going through full stack form 1996 through 2004 one can easily see evolution of ideas. Some key papers to read are: Learning to Trade via Direct Reinforcement https://pdfs.semanticscholar.org/1a49/99c918c6206cd9804c48f7dce1bac6ec5b4a.pdf
PERFORMANCE FUNCTIONS AND REINFORCEMENT LEARNING FOR TRADING SYSTEMS AND PORTFOLIOS http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.8437&rep=rep1&type=pdf
Reinforcement Learning for Trading https://pdfs.semanticscholar.org/93b8/17deef9dd5afc66ccf43174a07ddaa49854f.pdf
Think again, it is early bimillenary, no TF, PPO and all that deep stuff we used to have at hand until a decade passes.
Than in 2004 line breaks. The last public document available is Saffell’s 2005 thesis:
Knowledge discovery for time series https://www.semanticscholar.org/paper/Knowledge-discovery-for-time-series-Moody-Saffell/002165064501911ca06679ba762bd7ffc00bf44d … and some Paris conference ppt presentation.
Given industry realities, there are some reasons to interrupt such a line:
I think these guys are hilarious. Small team, no fuss, no conference talks, no private investors, no fancy landing pages but 2018 ‘HFM US Performance Award^ for Best Quantitative Strategy under $1B’.
As for me, everything just says: “ If you want to evidence financial RL going practical - look no further”.