Running to 1 or no trade on evaluation

jimmy6 commented 4 years ago

I am running to 1 or no trade on evaluation. I am just using sample code in TF DQN. The collect_step will trigger trade but evaluation step in compute_avg_return only has 1 or 0 trade.


for _ in range(num_iterations):

  # Collect a few steps using collect_policy and save to the replay buffer.
  for _ in range(collect_steps_per_iteration):
    collect_step(train_env, agent.collect_policy, replay_buffer)

  # Sample a batch of data from the buffer and update the agent's network.
  experience, unused_info = next(iterator)
  train_loss = agent.train(experience).loss

  step = agent.train_step_counter.numpy()

  if step % log_interval == 0:
    print('Time = {0}, step = {1}: loss = {2}'.format(datetime.now(), step, train_loss))
  if step % eval_interval == 0:
    avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
    print('Evaluate Time = {0}, step = {1}: Average Return = {2}'.format(datetime.now(), step, avg_return))
    returns.append(avg_return)

jimmy6 commented 4 years ago

Notebook is here. You can check on the visualization part. There is only 1 trade above the Evaluation and after lost.

Trade - price_diff=-0.10000000000013642 current_price=1502.1 last_trade_price=1502.2 action=1 Trade - price_diff=-0.09999999999990905 current_price=1502.0 last_trade_price=1502.1 action=0 Trade - price_diff=0.20000000000004547 current_price=1502.2 last_trade_price=1502.0 action=1 Trade - price_diff=0.20000000000004547 current_price=1502.4 last_trade_price=1502.2 action=0 Trade - price_diff=0.599999999999909 current_price=1503.0 last_trade_price=1502.4 action=1 Time = 2020-05-07 11:31:15.387706, step = 14000: loss = 27962944.0 _Trade - price_diff=0.20000000000004547 current_price=1507.3 last_tradeprice=1507.1 action=1 Evaluate Time = 2020-05-07 11:31:18.477631, step = 14000: Average Return = -2.0

AminHP commented 4 years ago

I tried to run your notebook on my machine, but there were some problems and I couldn't make it.

I recommend you to use the train_env as test_env and check out the results. I think the agent has not learned something useful and it always uses the same action for all states in the test_env.

jimmy6 commented 4 years ago

Thank your suggestion. I will be back.

AminHP commented 4 years ago

You're welcome!

AminHP / gym-anytrading

Running to 1 or no trade on evaluation #10