Codes for HKU CS Final Year Program 21011: Applying the differentiable physics on deformable objects to enhance the performance of robot learning tasks
1
stars
0
forks
source link
Bug: Out-of-memory when TaichiEnv is initialized #21
THE OOM OCCURS when I tried to load the configuration of rope.yml manually and use that configuration to initialize a TaichiEnv instance. The TaichiEnv is instantiated successfully, but the invoking of the TaichiEnv.initialize method would lead to the entire process being killed. Stepwise execution shows the OOM occurs when Taichi tries to initialize the first primitive in this environment.
Reproduce
Environment:
Inside Docker image. On our development server, it is a container named yhliu_dev.
Check out back to the main branch.
Execution
At PlasticineLab directory. In the yhliu_dev container, it is the /root/yhliu/PlasticineLab folder.
from plb.envs import PlasticineEnv
from plb.engine.taichi_env import TaichiEnv
cfg = PlasticineEnv.load_varaints('rope.yml', 1) # Rope-v1
tcEnv = TaichiEnv(cfg, False)
# THIS WILL LEAD TO THE PROCESS BEING KILLED
tcEnv.primitives.initialize()
Screen capture
Some outputs being trimmed
HTOP command showed that before the process was killed, the memory usage exploded.
Guess
There might be some memory leak relevant to the Torch NN feature in TaichiEnv. Despite the relatively small memory of our development machine (~16GiB), previously, it could at least load the environment when the *TORCH NN is not introduced.
Investigation
[ ] Run on a machine with larger memory to observe the memory usage pattern and identify the peak.
[ ] Run the original PLB on the development machine to locate the problem.
Out-of-memory when
TaichiEnv
is initializedTHE OOM OCCURS when I tried to load the configuration of
rope.yml
manually and use that configuration to initialize aTaichiEnv
instance. TheTaichiEnv
is instantiated successfully, but the invoking of theTaichiEnv.initialize
method would lead to the entire process being killed. Stepwise execution shows the OOM occurs when Taichi tries to initialize the first primitive in this environment.Reproduce
Environment:
yhliu_dev
.main
branch.Execution
At
PlasticineLab
directory. In theyhliu_dev
container, it is the/root/yhliu/PlasticineLab
folder.Screen capture
Some outputs being trimmed
HTOP command showed that before the process was killed, the memory usage exploded.
Guess
There might be some memory leak relevant to the Torch NN feature in
TaichiEnv
. Despite the relatively small memory of our development machine (~16GiB), previously, it could at least load the environment when the *TORCH NN is not introduced.Investigation