fyp21011 / PlasticineLab

Codes for HKU CS Final Year Program 21011: Applying the differentiable physics on deformable objects to enhance the performance of robot learning tasks
1 stars 0 forks source link

Bug: Out-of-memory when TaichiEnv is initialized #21

Open EE-LiuYunhao opened 2 years ago

EE-LiuYunhao commented 2 years ago

Out-of-memory when TaichiEnv is initialized

THE OOM OCCURS when I tried to load the configuration of rope.yml manually and use that configuration to initialize a TaichiEnv instance. The TaichiEnv is instantiated successfully, but the invoking of the TaichiEnv.initialize method would lead to the entire process being killed. Stepwise execution shows the OOM occurs when Taichi tries to initialize the first primitive in this environment.

Reproduce

Environment:

Execution

At PlasticineLab directory. In the yhliu_dev container, it is the /root/yhliu/PlasticineLab folder.

from plb.envs import PlasticineEnv
from plb.engine.taichi_env import TaichiEnv
cfg = PlasticineEnv.load_varaints('rope.yml', 1) # Rope-v1
tcEnv = TaichiEnv(cfg, False)
# THIS WILL LEAD TO THE PROCESS BEING KILLED
tcEnv.primitives.initialize()

Screen capture

image image

Some outputs being trimmed

image

HTOP command showed that before the process was killed, the memory usage exploded.

Guess

There might be some memory leak relevant to the Torch NN feature in TaichiEnv. Despite the relatively small memory of our development machine (~16GiB), previously, it could at least load the environment when the *TORCH NN is not introduced.

Investigation

shwnyao commented 2 years ago

This issue only happens with torch_nn, which is no longer maintained. I will remove it and clean up the implementation.