deep-reinforcement-learning-book / Chapter16-Robot-Learning-in-Simulation

Chapter 16 Robot Learning in Simulation in book Deep Reinforcement Learning: example of Sawyer robot learning to reach the target with paralleled Soft Actor-Critic (SAC) algorithm, using PyRep for Sawyer robot simulation and game building. The environment is wrapped into OpenAI Gym format.
https://deep-reinforcement-learning-book.github.io
49 stars 19 forks source link

error:"Gripper position is nan" #2

Closed jianye0428 closed 4 years ago

jianye0428 commented 4 years ago

Hello, i download the demo for SAC and i'm trying to train from scratch.

When I set the max_episode to 20, the demo can work. But when I set max_episode to 1000 or more, i get always error "Gripper position is nan", I don't know why this error always appear.

any advice?

Thanks in advance. Jian

luweiqing commented 4 years ago

Hello, I have the same problem during the training process,Do you solve the problem? Can you give me some advice?

Thank you in advance. Lu

jianye0428 commented 4 years ago

Sorry, i switched on vrep 3.6.2, but it seems still did not work.

luweiqing commented 4 years ago

Me,too. and if we switch another control mode "end_position" or giving a random target in the control mode”joint_velocity“,the SAC net is broken and the reward function value always unstable

quantumiracle commented 4 years ago

Hi, The sawyer simulation in V-REP seems to be unstable sometimes, which leads to a broken gripper during exploration. This is the reason why I have the code to restart the environment every 20 episodes during training, so that from my side the agent can smoothly finish the training process with thousands of episodes.

So could you @jianye0428 check if the gripper is still broken after the restarting code line above, via visualization of the robot scene?

I'm not sure if this problem is caused by the package version. To make sure this project works well, we recommend to use V-REP 3.6.2 and a compatible PyRep that we forked here, rather than directly installing the latest version.

As for the "end_position" mode @luweiqing, this project is not a solution for that. You may need to change the code a bit and fine-tune it to make that work.

Best, Zihan

jianye0428 commented 4 years ago

hello, thanks for reply. I'll have a try and feedback later. best, Jian

jianye0428 commented 4 years ago

Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.

Best, Jian

quantumiracle commented 4 years ago

Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.

Best, Jian

Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?

luweiqing commented 4 years ago

Hi, Thank you for your sincere reply, I have solved the error "Gripper position is nan" during thousands of eposides I have trained 80000s eposide but the reward value is always unstable and is not convergence。 the success rate is very low . can I need more eposide training?? can you give me some advice about how many eposides do I need to train the value function to be stable.

Best

Lu

luweiqing commented 4 years ago

as for the error "Gripper position is nan",it is because the output of policynetwork is [nan,nan,nan,nan,nan,nan,nan], it cause that if math.isnan(ax): # capture the broken gripper cases during exploration print('Gripper position is nan.') self.reinit()

jianye0428 commented 4 years ago

Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work. Best, Jian

Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?

I think it still broken when the environment is reinitialized. In my training process the reward turned out to be zero when the episodes exceeds 20. And I get the same error.

best, Jian

jianye0428 commented 4 years ago

Hi, Thank you for your sincere reply, I have solved the error "Gripper position is nan" during thousands of eposides I have trained 80000s eposide but the value is always unstable and is not convergence。 the success rate is very low . can I need more eposide training?? can you give me some advice about how many eposides do I need to train the value function to be stable.

Best

Lu

Hello,

I tried with the forked pyrep package but still with the same error.

Can I ask how did you solve the problem? Did you just use the forked pyrep or have you changed other things ?

Bests,
Jian

luweiqing commented 4 years ago

Hi. I changed robot from sawyer to baxter the reward value is always unstable and is not convergence even though I trained above 80000 eposides

quantumiracle commented 4 years ago

Hi. I changed robot from sawyer to baxter the reward value is always unstable and is not convergence even though I trained above 80000 eposides

Did you change the environment script after you change the robot from Sawyer to Baxter? Since the environment is basically customized for Sawyer, I'm not sure if it could directly work with Baxter. As for Sawyer, I only take thousands of episodes to have some primary learning results as the learning curve in Readme.

quantumiracle commented 4 years ago

Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work. Best, Jian

Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?

I think it still broken when the environment is reinitialized. In my training process the reward turned out to be zero when the episodes exceeds 20. And I get the same error.

best, Jian

If so, I would say different cases happen after the reinitialization from your side and my side. The gripper is complete and works well after reinitialization even if it's broken before in my tests. I currently do not know what causes this difference.

luweiqing commented 4 years ago

Today that error "Gripper position is nan“ appeared again,I am sure that I changed the enviroment script. on the other hand, I set the target object target for random , is this project a solution for that? or what I should make any other changes to this network?

luweiqing commented 4 years ago

I find the reason why the error comes out. the code is -= not =-

luweiqing commented 4 years ago

I find that reducing the number of threads can avoid the error :"Gripper position is nan“

quantumiracle commented 4 years ago

I find that reducing the number of threads can avoid the error :"Gripper position is nan“

You mean the process? How many did you use when you meet the error?

luweiqing commented 4 years ago

I find that reducing the number of threads can avoid the error :"Gripper position is nan“

You mean the process? How many did you use when you meet the error?

yes,I use more than 4 processes when I meet the error,And as the number of training sessions increases, the value function will converge when I use only 1 process

luweiqing commented 4 years ago

And I have a question, how to summarize the final training results of multi-threaded training? A3C is multi-threaded sampling, but still single-threaded training。

quantumiracle commented 4 years ago

And I have a question, how to summarize the final training results of multi-threaded training? A3C is multi-threaded sampling, but still single-threaded training。

If you use multi-threading, the variables and objects can be shared across threads within a process, in which case you can log the results easily by reading these shared objects; if you use multi-process, a queue can be used for sending information across processes.

Mr-Trigg commented 3 years ago

Hi, I've been trying to train the sac_learn file but I was getting the "Gripper position is nan" error. I tried the suggestions here, I was using 4 parallel process, then 2, both cases crashed with the gripper position thing, now I've been running the training with just 1 process, has been 13 hrs by now, episode 24k+ and the episode reward still around -3 ~ -2, sometimes there's a 7 but is quite rare.

I'm using Ubuntu 18.04 as OS, python3.6.9, v-rep pro edu v3.6.2, this github pyrep version and pytorch 1.8.1 with CUDA 11.1 with a RTX 2080 super as gpu and a ryzen 7 3800x as cpu