AdrianP- / gym_trading

Apache License 2.0
94 stars 25 forks source link

difference between reward and profit #4

Closed StrikerRUS closed 3 years ago

StrikerRUS commented 6 years ago

Hi @AdrianP- !

First of all, many thanks for this repo! For me, it's the clearest and features-rich at the same time among many other trading gyms.

Sorry for using issues for asking the questing, but I haven't found any contacts on your GitHub page.

Can you please clarify the difference between reward and Profit?

AdrianP- commented 6 years ago

Hi @StrikerRUS , thanks to you :) reward it's the result of a function necessary for the reinforcement algorithm. In this function, you define whatever you want depending on your problem/environment- and the algorithm will maximize/minimize the result (reward) In the typical stock trading algorithm, you want to maximize the profit, for this reason, the profit is the reward

StrikerRUS commented 6 years ago

@AdrianP- Thanks for the prompt response!

From your words I got that reward == profit, and in logs we could replace one with another, right?

Also I have another question: does _generate_summary_stats() show the statistics of the last episode of training stage? Or what?.. https://github.com/AdrianP-/gym_trading/blob/af5767a469db550459517e10c439df1c8764aa86/gym_trading/envs/trading_env.py#L41 Because according the logs from the repo's notebook it's not true:

steps                     | 299459
episodes                  | 140
% time spent exploring    | 2
--
mean episode reward       | 197.2
Total operations          | 93
Avg duration trades       | 21.28
Total profit              | 99.0  <------------------------
Avg profit per trade      | 1.421
--
Total profit test:        > -260.8
Avg profit per trade test > -11.521
-------------------------------------
SUMMARY STATISTICS
Total Trades Taken:  14
Total Reward:  347.3  <------------------------
Average Reward per Trade:  24.8071428571
Win Ratio: 71.4285714286 %

UPD: Also I've noticed the main difference between Profit and reward: Profit is calculated as, let say, real profit in the real world, like delta of exit and entry prices of the current trade, while the reward calculation is based on current time prices.

AdrianP- commented 6 years ago

Exactly as you said in the update, you can write the function reward as the mean of each profit (as it is now, mean of the last 100 operations) or profit per trade (you want to maximize each trade, not global).

_generate_summary_stats function is called when the episode is solved or at the end of the episode. Episode is (np.mean(episode_rewards[-101:-1]) > 500 or t >= 10000) and len(env.portfolio.journal) != 0

StrikerRUS commented 6 years ago

Thanks for the answer! But if I'm not mistaken, episode is a 1 full iteration over the train part of the file, and the condition you've mentioned is used to check whether the training is over, doesn't it?

StrikerRUS commented 6 years ago

@jalalmzh Please see #3 and next time create your own issue.