Closed StrikerRUS closed 3 years ago
Hi @StrikerRUS , thanks to you :)
reward
it's the result of a function necessary for the reinforcement algorithm. In this function, you define whatever you want depending on your problem/environment- and the algorithm will maximize/minimize the result (reward
)
In the typical stock trading algorithm, you want to maximize the profit
, for this reason, the profit
is the reward
@AdrianP- Thanks for the prompt response!
From your words I got that reward == profit
, and in logs we could replace one with another, right?
Also I have another question: does _generate_summary_stats()
show the statistics of the last episode of training stage? Or what?..
https://github.com/AdrianP-/gym_trading/blob/af5767a469db550459517e10c439df1c8764aa86/gym_trading/envs/trading_env.py#L41
Because according the logs from the repo's notebook it's not true:
steps | 299459
episodes | 140
% time spent exploring | 2
--
mean episode reward | 197.2
Total operations | 93
Avg duration trades | 21.28
Total profit | 99.0 <------------------------
Avg profit per trade | 1.421
--
Total profit test: > -260.8
Avg profit per trade test > -11.521
-------------------------------------
SUMMARY STATISTICS
Total Trades Taken: 14
Total Reward: 347.3 <------------------------
Average Reward per Trade: 24.8071428571
Win Ratio: 71.4285714286 %
UPD:
Also I've noticed the main difference between Profit
and reward
: Profit
is calculated as, let say, real profit in the real world, like delta of exit and entry prices of the current trade, while the reward
calculation is based on current time prices.
Exactly as you said in the update, you can write the function reward
as the mean of each profit (as it is now, mean of the last 100 operations) or profit per trade (you want to maximize each trade, not global).
_generate_summary_stats
function is called when the episode is solved or at the end of the episode. Episode is (np.mean(episode_rewards[-101:-1]) > 500 or t >= 10000) and len(env.portfolio.journal) != 0
Thanks for the answer! But if I'm not mistaken, episode is a 1 full iteration over the train part of the file, and the condition you've mentioned is used to check whether the training is over, doesn't it?
@jalalmzh Please see #3 and next time create your own issue.
Hi @AdrianP- !
First of all, many thanks for this repo! For me, it's the clearest and features-rich at the same time among many other trading gyms.
Sorry for using issues for asking the questing, but I haven't found any contacts on your GitHub page.
Can you please clarify the difference between
reward
andProfit
?