unable to run base model inference for lafan1_15step

dvoidus commented 1 month ago

Followed installation steps and tried running default inference commad: python run_env.py --arg_file .\args\RP_amdm_lafan1.txt

got the error:

Setting seed: 114510
Pytorch doesn't support NCCL on Windows, defaulting to gloo backend
Loading LAFAN1 dataset class
Loading LAFAN1 dataset
Traceback (most recent call last):
  File "run_env.py", line 240, in <module>
    main(sys.argv)
  File "run_env.py", line 232, in main
    run(0, num_workers, args)
  File "run_env.py", line 141, in run
    dataset = build_dataset(model_config_file, load_full_motion)
  File "run_env.py", line 53, in build_dataset
    dataset = dataset_builder.build_dataset(config, load_full_dataset)
  File "*\AMDM\dataset\dataset_builder.py", line 32, in build_dataset
    dataset = lafan1_dataset.LAFAN1(config)
  File "*\AMDM\dataset\lafan1_dataset.py", line 13, in __init__
    super().__init__(config)
  File "*\AMDM\dataset\base_dataset.py", line 238, in __init__
    self.joint_parent = bvh_util.get_parent_from_link(self.links)
  File "*\AMDM\dataset\util\bvh.py", line 26, in get_parent_from_link
    for pair in links:
TypeError: 'NoneType' object is not iterable

Modified skeleton_info.py so that LAFAN1 skel_dict item includes 'links': LAFAN1_links field, now getting the following error:

Setting seed: 114510
Pytorch doesn't support NCCL on Windows, defaulting to gloo backend
Loading LAFAN1 dataset class
Loading LAFAN1 dataset
Loading test file: data/LAFAN1/aiming1_subject4.bvh
Traceback (most recent call last):
  File "run_env.py", line 240, in <module>
    main(sys.argv)
  File "run_env.py", line 232, in main
    run(0, num_workers, args)
  File "run_env.py", line 144, in run
    normed_motion = dataset.load_new_data(test_motion_file)
  File "*\AMDM\dataset\lafan1_dataset.py", line 31, in load_new_data
    x_normed = self.norm_data(x)
  File "*\AMDM\dataset\base_dataset.py", line 434, in norm_data
    normalization = self.normalization
AttributeError: 'LAFAN1' object has no attribute 'normalization'

Any advice on what am I missing to make it work?

dvoidus commented 1 month ago

I was able to fix previous error by setting self.load_full_data to True and run script once to generate stats.npz and data.npz files I think it makes sense to update inference instructions to include this step.

Unfortunately it still crashes with a different error:

(amdm) PS> python run_env.py --arg_file .\args\RP_amdm_lafan1.txt
Setting seed: 114510
Pytorch doesn't support NCCL on Windows, defaulting to gloo backend
Loading LAFAN1 dataset class
Loading LAFAN1 dataset
Loading test file: data/LAFAN1/aiming1_subject4.bvh
Loading model param:output/base/amdm_lafan1/model_param.pth
 model config:output/base/amdm_lafan1/config.yaml
Building AMDM model
Using EMA
Building policy.envs.randomplay_env:RandomPlayEnv-RandomPlayEnv
pybullet build time: Oct 20 2024 12:41:38
argv[0]=--background_color_red=0.2
argv[1]=--background_color_green=0.2
argv[2]=--background_color_blue=0.2
starting thread 0
started testThreads thread 0 with threadHandle 0000000000000688
argc=5
argv[0] = --unused
argv[1] = --background_color_red=0.2
argv[2] = --background_color_green=0.2
argv[3] = --background_color_blue=0.2
argv[4] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
Version = 4.6.0 NVIDIA 560.70
Vendor = NVIDIA Corporation
Renderer = NVIDIA GeForce RTX 3070/PCIe/SSE2
b3Printf: Selected demo: Physics Server
starting thread 0
started MotionThreads thread 0 with threadHandle 0000000000000AC8
MotionThreadFunc thread started
agent is None, test no agent
E:\miniconda3\envs\amdm\lib\site-packages\gymnasium\utils\passive_env_checker.py:198: DeprecationWarning: WARN: Current gymnasium version requires that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  "Current gymnasium version requires that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator."
E:\miniconda3\envs\amdm\lib\site-packages\gymnasium\utils\passive_env_checker.py:211: DeprecationWarning: WARN: Current gymnasium version requires that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  "Current gymnasium version requires that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information."
resetting, starting frame index: 4808
resetting, starting frame index: 4009
Traceback (most recent call last):
  File "run_env.py", line 241, in <module>
    main(sys.argv)
  File "run_env.py", line 233, in main
    run(0, num_workers, args)
  File "run_env.py", line 207, in run
    test_no_agent(env)
  File "run_env.py", line 87, in test_no_agent
    _, reward, done, info = env.calc_env_state(frame)
  File "*\AMDM\policy\envs\randomplay_env.py", line 133, in calc_env_state
    self.render()
  File "*\AMDM\policy\envs\randomplay_env.py", line 144, in render
    torch.tensor(self.dataset.x_to_jnts(frame, mode='angle'),device=self.device, dtype=self.history.dtype),  # 0 is the newest
  File "*\AMDM\dataset\base_dataset.py", line 685, in x_to_jnts
    jnts = self.fk_local_seq(x)
  File "*\AMDM\dataset\base_dataset.py", line 605, in fk_local_seq
    joint_positions[:,i] = joint_positions[:,self.joint_parent[i]] + np.matmul(joint_orientations[:,self.joint_parent[i]], joint_offset[i])
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 22 is different from 3)
numActiveThreads = 0
stopping threads
Thread with taskId 0 with handle 0000000000000AC8 exiting
Thread TERMINATED
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 with handle 0000000000000688 exiting
Thread TERMINATED

Yi-Shi94 commented 1 month ago

Followed installation steps and tried running default inference commad: python run_env.py --arg_file .\args\RP_amdm_lafan1.txt

got the error:

Setting seed: 114510
Pytorch doesn't support NCCL on Windows, defaulting to gloo backend
Loading LAFAN1 dataset class
Loading LAFAN1 dataset
Traceback (most recent call last):
  File "run_env.py", line 240, in <module>
    main(sys.argv)
  File "run_env.py", line 232, in main
    run(0, num_workers, args)
  File "run_env.py", line 141, in run
    dataset = build_dataset(model_config_file, load_full_motion)
  File "run_env.py", line 53, in build_dataset
    dataset = dataset_builder.build_dataset(config, load_full_dataset)
  File "*\AMDM\dataset\dataset_builder.py", line 32, in build_dataset
    dataset = lafan1_dataset.LAFAN1(config)
  File "*\AMDM\dataset\lafan1_dataset.py", line 13, in __init__
    super().__init__(config)
  File "*\AMDM\dataset\base_dataset.py", line 238, in __init__
    self.joint_parent = bvh_util.get_parent_from_link(self.links)
  File "*\AMDM\dataset\util\bvh.py", line 26, in get_parent_from_link
    for pair in links:
TypeError: 'NoneType' object is not iterable

Modified skeleton_info.py so that LAFAN1 skel_dict item includes 'links': LAFAN1_links field, now getting the following error:

Setting seed: 114510
Pytorch doesn't support NCCL on Windows, defaulting to gloo backend
Loading LAFAN1 dataset class
Loading LAFAN1 dataset
Loading test file: data/LAFAN1/aiming1_subject4.bvh
Traceback (most recent call last):
  File "run_env.py", line 240, in <module>
    main(sys.argv)
  File "run_env.py", line 232, in main
    run(0, num_workers, args)
  File "run_env.py", line 144, in run
    normed_motion = dataset.load_new_data(test_motion_file)
  File "*\AMDM\dataset\lafan1_dataset.py", line 31, in load_new_data
    x_normed = self.norm_data(x)
  File "*\AMDM\dataset\base_dataset.py", line 434, in norm_data
    normalization = self.normalization
AttributeError: 'LAFAN1' object has no attribute 'normalization'

Any advice on what am I missing to make it work?

Thank you for trying out A-MDM and your feedback! I really appreciate sharing issues encountered while running the code. I apologize for not making this step clearer in the instructions. I’ll make sure to include it in the documentation. In the meantime, I also added stats.npz for each dataset so that you can run it without the full dataset.

Yi-Shi94 commented 1 month ago

I was able to fix previous error by setting self.load_full_data to True and run script once to generate stats.npz and data.npz files I think it makes sense to update inference instructions to include this step.

Unfortunately it still crashes with a different error:

(amdm) PS> python run_env.py --arg_file .\args\RP_amdm_lafan1.txt
Setting seed: 114510
Pytorch doesn't support NCCL on Windows, defaulting to gloo backend
Loading LAFAN1 dataset class
Loading LAFAN1 dataset
Loading test file: data/LAFAN1/aiming1_subject4.bvh
Loading model param:output/base/amdm_lafan1/model_param.pth
 model config:output/base/amdm_lafan1/config.yaml
Building AMDM model
Using EMA
Building policy.envs.randomplay_env:RandomPlayEnv-RandomPlayEnv
pybullet build time: Oct 20 2024 12:41:38
argv[0]=--background_color_red=0.2
argv[1]=--background_color_green=0.2
argv[2]=--background_color_blue=0.2
starting thread 0
started testThreads thread 0 with threadHandle 0000000000000688
argc=5
argv[0] = --unused
argv[1] = --background_color_red=0.2
argv[2] = --background_color_green=0.2
argv[3] = --background_color_blue=0.2
argv[4] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
Version = 4.6.0 NVIDIA 560.70
Vendor = NVIDIA Corporation
Renderer = NVIDIA GeForce RTX 3070/PCIe/SSE2
b3Printf: Selected demo: Physics Server
starting thread 0
started MotionThreads thread 0 with threadHandle 0000000000000AC8
MotionThreadFunc thread started
agent is None, test no agent
E:\miniconda3\envs\amdm\lib\site-packages\gymnasium\utils\passive_env_checker.py:198: DeprecationWarning: WARN: Current gymnasium version requires that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  "Current gymnasium version requires that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator."
E:\miniconda3\envs\amdm\lib\site-packages\gymnasium\utils\passive_env_checker.py:211: DeprecationWarning: WARN: Current gymnasium version requires that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  "Current gymnasium version requires that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information."
resetting, starting frame index: 4808
resetting, starting frame index: 4009
Traceback (most recent call last):
  File "run_env.py", line 241, in <module>
    main(sys.argv)
  File "run_env.py", line 233, in main
    run(0, num_workers, args)
  File "run_env.py", line 207, in run
    test_no_agent(env)
  File "run_env.py", line 87, in test_no_agent
    _, reward, done, info = env.calc_env_state(frame)
  File "*\AMDM\policy\envs\randomplay_env.py", line 133, in calc_env_state
    self.render()
  File "*\AMDM\policy\envs\randomplay_env.py", line 144, in render
    torch.tensor(self.dataset.x_to_jnts(frame, mode='angle'),device=self.device, dtype=self.history.dtype),  # 0 is the newest
  File "*\AMDM\dataset\base_dataset.py", line 685, in x_to_jnts
    jnts = self.fk_local_seq(x)
  File "*\AMDM\dataset\base_dataset.py", line 605, in fk_local_seq
    joint_positions[:,i] = joint_positions[:,self.joint_parent[i]] + np.matmul(joint_orientations[:,self.joint_parent[i]], joint_offset[i])
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 22 is different from 3)
numActiveThreads = 0
stopping threads
Thread with taskId 0 with handle 0000000000000AC8 exiting
Thread TERMINATED
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 with handle 0000000000000688 exiting
Thread TERMINATED

I couldn't reproduce the issue you are facing. Can you provide more details? For example, your configs, args, and the branch. To help reproduce the problem, could you pull the latest change and try again? Thanks

dvoidus commented 1 month ago

Managed to fix it and run the inference

Have to add joint_offset reshape in the fk_local_seq method in base_dataset.py:


    joint_offset = joint_offset.reshape(-1, 3)

    for i in range(self.num_jnt):
        local_rotation = ang_frames[:, self.data_rot_dim*i: self.data_rot_dim*(i+1)]
        local_rotation = self.from_rpr_to_rotmat(torch.tensor(local_rotation)).numpy()
        if self.joint_parent[i] == -1: #root
            joint_orientations[:,i,:,:] = local_rotation 
        else:                
            joint_orientations[:,i] = np.matmul(joint_orientations[:,self.joint_parent[i]], local_rotation)
            joint_positions[:,i] = joint_positions[:,self.joint_parent[i]] + np.matmul(joint_orientations[:,self.joint_parent[i]], joint_offset[i])


2. By default requirements.txt install cpu only version of torch, so I have to manually install cu117 version (torch==1.13.1+cu117) to make it work.

3. In joystick_env.py I have to either do this:

        self.joystick_arr[:,int(self.timestep),0] = self.target_speed 
        self.joystick_arr[:,int(self.timestep),1] = self.target_direction

or this

        self.joystick_arr[:,self.timestep.long(),0] = self.target_speed 
        self.joystick_arr[:,self.timestep.long(),1] = self.target_direction

and
`            self.done.fill_(float(self.timestep >= self.max_timestep))`

4. In `base_dataset.py` I think lines 136, 137 has a wrong indentation

                if 'labels' in data.keys():
                    self.labels= data['labels']


has to be within `with np.load(osp.join(self.path,'data.npz')) as data:` block

dvoidus commented 1 month ago

Also args folder has lots of config files pointing to non existing configs, e.g. the default one from the README steps points to amdm_lafan1_small empty folder

Yi-Shi94 commented 1 month ago

Also args folder has lots of config files pointing to non existing configs, e.g. the default one from the README steps points to amdm_lafan1_small empty folder

This has been fixed and I believe your problem is due to older version of this repo. I don't have any of these issues

Yi-Shi94 commented 1 month ago

Maybe part of the problem is unique to windows. I will try it on my windows machine in a few days. will update requirements.txt. Thanks for sharing your experience!

Yi-Shi94 commented 1 month ago

Managed to fix it and run the inference

Have to add joint_offset reshape in the fk_local_seq method in base_dataset.py:

        joint_offset = joint_offset.reshape(-1, 3)

        for i in range(self.num_jnt):
            local_rotation = ang_frames[:, self.data_rot_dim*i: self.data_rot_dim*(i+1)]
            local_rotation = self.from_rpr_to_rotmat(torch.tensor(local_rotation)).numpy()
            if self.joint_parent[i] == -1: #root
                joint_orientations[:,i,:,:] = local_rotation 
            else:                
                joint_orientations[:,i] = np.matmul(joint_orientations[:,self.joint_parent[i]], local_rotation)
                joint_positions[:,i] = joint_positions[:,self.joint_parent[i]] + np.matmul(joint_orientations[:,self.joint_parent[i]], joint_offset[i])

By default requirements.txt install cpu only version of torch, so I have to manually install cu117 version (torch==1.13.1+cu117) to make it work.
In joystick_env.py I have to either do this:

            self.joystick_arr[:,int(self.timestep),0] = self.target_speed 
            self.joystick_arr[:,int(self.timestep),1] = self.target_direction

or this

            self.joystick_arr[:,self.timestep.long(),0] = self.target_speed 
            self.joystick_arr[:,self.timestep.long(),1] = self.target_direction

and self.done.fill_(float(self.timestep >= self.max_timestep))

In base_dataset.py I think lines 136, 137 has a wrong indentation

                    if 'labels' in data.keys():
                        self.labels= data['labels']

has to be within with np.load(osp.join(self.path,'data.npz')) as data: block

The bugs still exist in current master, will fix soon

Yi-Shi94 / AMDM

unable to run base model inference for lafan1_15step #2