DeepX-inc / machina

Control section: Deep Reinforcement Learning framework
MIT License
279 stars 43 forks source link

Executed run_r2d2_sac.py,`BrokenPipeError: [Errno 32] Broken pipe` occurred. #260

Closed hosokawa-taiji closed 4 years ago

hosokawa-taiji commented 4 years ago

I executed run_r2d2_sac.py, but BrokenPipeError: [Errno 32] Broken pipe occurred. Then RAM usage was up to the limit. My environment is the same as #258. In the case of #258, the memory usage gradually increased and stopped at a certain point, while in the case of r2d2_sac, it continued to increase to the limit. Could you help me?

mmisono commented 4 years ago

Do you mean GPU RAM or main memory? What is your size of RAM?

mmisono commented 4 years ago

run_r2d2_sac.py keeps all trajectory during training, so the used memory size constantly grows.

https://github.com/DeepX-inc/machina/blob/f93bb6f5aca1feccd71fc509bd6370d2015e2d85/example/run_r2d2_sac.py#L159

hosokawa-taiji commented 4 years ago

Sorry,I can't judge GPU RAM or main memory usage in google colaboratory. just displayed RAM. size of GPU RAM is about 13GB.

the used memory size constantly grows

Will it grows with each iteration? Will it return when the iteration is over?

mmisono commented 4 years ago

Yes, the memory size grows with each iteration and it is not freed until the program termination. One possible workaround is storing trajectories in storage and load them when necessary but currently, machina does not support it.

hosokawa-taiji commented 4 years ago

I see...But it is hard to me to code that. I put up with the current situation. Thank you for answering the question.

hosokawa-taiji commented 4 years ago

Could you tell me how to implement it? Is it difficult?

hosokawa-taiji commented 4 years ago

I changed like this below. But during training, as the number of iterations increases you know , the used memory size constantly grows and finally occurred BrokenPipeError. Is there any other way?

    with measure('train'):
        on_traj = Traj(traj_device='cpu')
        on_traj.add_epis(epis)

        on_traj = ef.add_next_obs(on_traj)
        max_pri = on_traj.get_max_pri()
        on_traj = ef.set_all_pris(on_traj, max_pri)
        on_traj = ef.compute_seq_pris(on_traj, args.seq_length)
        on_traj = ef.compute_h_masks(on_traj)
        for i in range(len(qfs)):
            on_traj = ef.compute_hs(
                on_traj, qfs[i], hs_name='q_hs'+str(i), input_acs=True)
            on_traj = ef.compute_hs(
                on_traj, targ_qfs[i], hs_name='targ_q_hs'+str(i), input_acs=True)
        on_traj.register_epis()

        # if not exists off_traj
        if not os.path.exists(os.path.join(
            args.log, 'trajs', 'off_traj.pkl')):
            # off_traj create
            off_traj = Traj(args.max_steps_off, traj_device='cpu')
            # off_traj save
            with open(os.path.join(
            args.log, 'trajs', 'off_traj.pkl'), 'wb') as f:
                dill.dump(off_traj , f)
        # off_traj load
        with open(os.path.join(
            args.log, 'trajs', 'off_traj.pkl'), 'rb') as f:
            off_traj = dill.load(f)
        # add_traj
        off_traj.add_traj(on_traj)
        # off_traj save
        with open(os.path.join(
        args.log, 'trajs', 'off_traj.pkl'), 'wb') as f:
            dill.dump(off_traj , f)

        total_epi += on_traj.num_epi
        step = on_traj.num_step
        total_step += step

        result_dict = r2d2_sac.train(
            off_traj,
            pol, qfs, targ_qfs, log_alpha,
            optim_pol, optim_qfs, optim_alpha,
            step//50, args.rnn_batch_size, args.seq_length, args.burn_in_length,
            args.tau, args.gamma, args.sampling, not args.no_reparam
        )
        # off_traj del
        del off_traj
mmisono commented 4 years ago

I don't investigate what uses memory so much in the above code, but If you delete off_traj at every iteration, it changes the algorithm.

It is needed to change the way how traj store data. I'll look into it.

hosokawa-taiji commented 4 years ago

but If you delete off_traj at every iteration, it changes the algorithm.

OK.Undo the code.

I'll look into it.

Thank you . I look forward to good findings.