Closed jianye0428 closed 4 years ago
Hello, I have the same problem during the training process,Do you solve the problem? Can you give me some advice?
Thank you in advance. Lu
Sorry, i switched on vrep 3.6.2, but it seems still did not work.
Me,too. and if we switch another control mode "end_position" or giving a random target in the control mode”joint_velocity“,the SAC net is broken and the reward function value always unstable
Hi, The sawyer simulation in V-REP seems to be unstable sometimes, which leads to a broken gripper during exploration. This is the reason why I have the code to restart the environment every 20 episodes during training, so that from my side the agent can smoothly finish the training process with thousands of episodes.
So could you @jianye0428 check if the gripper is still broken after the restarting code line above, via visualization of the robot scene?
I'm not sure if this problem is caused by the package version. To make sure this project works well, we recommend to use V-REP 3.6.2 and a compatible PyRep that we forked here, rather than directly installing the latest version.
As for the "end_position" mode @luweiqing, this project is not a solution for that. You may need to change the code a bit and fine-tune it to make that work.
Best, Zihan
hello, thanks for reply. I'll have a try and feedback later. best, Jian
Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.
Best, Jian
Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.
Best, Jian
Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?
Hi, Thank you for your sincere reply, I have solved the error "Gripper position is nan" during thousands of eposides I have trained 80000s eposide but the reward value is always unstable and is not convergence。 the success rate is very low . can I need more eposide training?? can you give me some advice about how many eposides do I need to train the value function to be stable.
Best
Lu
as for the error "Gripper position is nan",it is because the output of policynetwork is [nan,nan,nan,nan,nan,nan,nan], it cause that if math.isnan(ax): # capture the broken gripper cases during exploration print('Gripper position is nan.') self.reinit()
Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work. Best, Jian
Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?
I think it still broken when the environment is reinitialized. In my training process the reward turned out to be zero when the episodes exceeds 20. And I get the same error.
best, Jian
Hi, Thank you for your sincere reply, I have solved the error "Gripper position is nan" during thousands of eposides I have trained 80000s eposide but the value is always unstable and is not convergence。 the success rate is very low . can I need more eposide training?? can you give me some advice about how many eposides do I need to train the value function to be stable.
Best
Lu
Hello,
I tried with the forked pyrep package but still with the same error.
Can I ask how did you solve the problem? Did you just use the forked pyrep or have you changed other things ?
Bests,
Jian
Hi. I changed robot from sawyer to baxter the reward value is always unstable and is not convergence even though I trained above 80000 eposides
Hi. I changed robot from sawyer to baxter the reward value is always unstable and is not convergence even though I trained above 80000 eposides
Did you change the environment script after you change the robot from Sawyer to Baxter? Since the environment is basically customized for Sawyer, I'm not sure if it could directly work with Baxter. As for Sawyer, I only take thousands of episodes to have some primary learning results as the learning curve in Readme.
Hello, i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work. Best, Jian
Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?
I think it still broken when the environment is reinitialized. In my training process the reward turned out to be zero when the episodes exceeds 20. And I get the same error.
best, Jian
If so, I would say different cases happen after the reinitialization from your side and my side. The gripper is complete and works well after reinitialization even if it's broken before in my tests. I currently do not know what causes this difference.
Today that error "Gripper position is nan“ appeared again,I am sure that I changed the enviroment script. on the other hand, I set the target object target for random , is this project a solution for that? or what I should make any other changes to this network?
I find that reducing the number of threads can avoid the error :"Gripper position is nan“
I find that reducing the number of threads can avoid the error :"Gripper position is nan“
You mean the process? How many did you use when you meet the error?
I find that reducing the number of threads can avoid the error :"Gripper position is nan“
You mean the process? How many did you use when you meet the error?
yes,I use more than 4 processes when I meet the error,And as the number of training sessions increases, the value function will converge when I use only 1 process
And I have a question, how to summarize the final training results of multi-threaded training? A3C is multi-threaded sampling, but still single-threaded training。
And I have a question, how to summarize the final training results of multi-threaded training? A3C is multi-threaded sampling, but still single-threaded training。
If you use multi-threading, the variables and objects can be shared across threads within a process, in which case you can log the results easily by reading these shared objects; if you use multi-process, a queue can be used for sending information across processes.
Hi, I've been trying to train the sac_learn file but I was getting the "Gripper position is nan" error. I tried the suggestions here, I was using 4 parallel process, then 2, both cases crashed with the gripper position thing, now I've been running the training with just 1 process, has been 13 hrs by now, episode 24k+ and the episode reward still around -3 ~ -2, sometimes there's a 7 but is quite rare.
I'm using Ubuntu 18.04 as OS, python3.6.9, v-rep pro edu v3.6.2, this github pyrep version and pytorch 1.8.1 with CUDA 11.1 with a RTX 2080 super as gpu and a ryzen 7 3800x as cpu
Hello, i download the demo for SAC and i'm trying to train from scratch.
When I set the max_episode to 20, the demo can work. But when I set max_episode to 1000 or more, i get always error "Gripper position is nan", I don't know why this error always appear.
any advice?
Thanks in advance. Jian