error in high level student policy

whn981841576 commented 4 months ago

i set teacher checkpoint in arg,but there is a issue Traceback (most recent call last): File "train_multi_bc_deter.py", line 404, in trainer.train() File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_trainer.py", line 70, in train self.single_agent_train() File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_trainer.py", line 140, in single_agent_train self.agents.record_transition(student_obs=student_obs, File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_rnn.py", line 238, in record_transition self.memory.add_samples(student_obs=student_obs, teacher_obs=teacher_obs, actions=actions, teacher_actions=teacher_actions, rewards=rewards, File "/home/scq/pycharm_project/visual_wholebody-main/third_party/skrl/skrl/memories/torch/base.py", line 266, in add_samples self.tensors[name][self.memoryindex].copy(tensor) TypeError: copy_(): argument 'other' (position 1) must be Tensor, not int

hatimwen commented 4 months ago

use PyTorch with version 2.X like 2.1.2

whn981841576 commented 4 months ago

use PyTorch with version 2.X like 2.1.2 thank you for your suggestion,when i use pytorch with version 2.12,i have solved this problem,but there is a new issue File "train_multi_bc_deter.py", line 404, in trainer.train() File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_trainer.py", line 70, in train self.single_agent_train() File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_trainer.py", line 153, in single_agent_train self.agents.post_interaction(timestep=timestep, timesteps=self.timesteps) File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_rnn.py", line 265, in post_interaction self._update(timestep, timesteps) File "/home/scq/pycharm_project/visual_wholebody-main/high-level/learning/dagger_rnn.py", line 338, in _update (dagger_loss + entropy_loss).backward() File "/home/scq/anaconda3/envs/ava/lib/python3.8/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/home/scq/anaconda3/envs/ava/lib/python3.8/site-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 792.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 866.69 MiB is free. Including non-PyTorch memory, this process has 22.47 GiB memory in use. Of the allocated memory 10.16 GiB is allocated by PyTorch, and 1.92 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What troubles me is that the cuda of the teacher's policy did not exceed the required memory, and my graphics card is 3090, while also reducing the number of environments from 10240 to 4096

whn981841576 commented 4 months ago

use PyTorch with version 2.X like 2.1.2

when i use nvidia-smi, It shows that I only have half of my memory

hatimwen commented 4 months ago

My device is also 3090 and it works. So I suggest you check if there're other processes alive.

Btw, what about the performance of your trained teacher policy?

whn981841576 commented 4 months ago

我的设备也是 3090，它可以工作。所以我建议你检查一下是否还有其他进程。

顺便说一句，您训练有素的教师政策的表现如何？

my high-level teacher policy is terrible.It sometimes grabs an object, only to have the gripper touch the object and move the object at the same time. The log shows that the grip was successful once, and I am still testing it, the only thing I know for sure is that the approach reward is completed when the gripper is close to the target.In the log, it is only close to one time and it is considered to be a number plus 1, but the actual success rate does not increase, and the total success rate is only a few percent when the training is completed, I don't know if this is due to the fact that my student strategy is not being trained.In short, in the training effect of gym, the reward effect of pick_reward is not reflected.

whn981841576 commented 4 months ago

My device is also 3090 and it works. So I suggest you check if there're other processes alive.

Btw, what about the performance of your trained teacher policy?

I suspect that it is a problem with the setting of the pickup reward function, in the code the reward function of the target height and the gripper height is represented by the lift function, and finally used in the pick function. But in the process of picking the reward, if the gripper only approaches the target and moves the target, the reward is completed, and the opening and closing action of the end gripper has been put into the neural network for training in the process of approaching, and the gripper will continue to repeat the opening and closing action in the process of contacting the object, and many times the gripper only touches the object in a closed state, which brings great difficulty to the training. Is it possible to set the gripper to close when the contact bonus is completed instead of using the gripper action as a training value? This is just my thoughts, if you need to change the network framework and reward function framework to implement, please let me know if there is any progress on it.

Ericonaldo commented 3 months ago

@whn981841576 Hi, I found a potential issue and solution for the high-level training. Can you have a try? I wrote all details in https://github.com/Ericonaldo/visual_wholebody/issues/11.

whn981841576 commented 2 months ago

嗨，我发现了高级培训的潜在问题和解决方案。您可以尝试一下吗？我在 #11 中写下了所有细节。

thanks,i see your reply and i am training ,then i will change the height and environment variables，if i find some issue i will contact with you

Ericonaldo / visual_wholebody

error in high level student policy #8