Questions and Request for Assistance on Solo Robot COM Tasks

SizheWei commented 4 days ago

Hi Daniel,

Thank you for sharing this incredible work! I’m currently trying to replicate the code for the Solo robot COM tasks using the rss2023 branch (latest commit: d7d328a), but I’ve encountered some challenges. I was hoping you could kindly assist me with the following issues:

1. Running the Training Code: I attempted to run the training code using the command:

python train_supervised.py dataset=com_momentum robot_name=Solo-k4 model=emlp dataset.train_ratio=0.7 model.lr=1e-5 dataset.augment=False

However, I encountered several issues during execution. Could you confirm if this is the correct command or let me know if any modifications are needed? The error looks like this:

An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
/home/xx/Documents/proj/MorphoSymm-Replication/train_supervised.py:205: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path='cfg/supervised', config_name='config')
/home/xx/miniconda3/envs/rss2023/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[INFO][__main__] 

 NEW RUN 

Global seed set to 972
pybullet build time: Nov 28 2023 23:45:17
 -- Done
[INFO][datasets.com_momentum.com_momentum] CoM[Solo12Bullet]-train-Aug:False-X:torch.Size([70000, 24])-Y:torch.Size([70000, 6])
[INFO][datasets.com_momentum.com_momentum] CoM[Solo12Bullet]-test-Aug:False-X:torch.Size([120000, 24])-Y:torch.Size([120000, 6])
[INFO][datasets.com_momentum.com_momentum] CoM[Solo12Bullet]-val-Aug:True-X:torch.Size([15000, 24])-Y:torch.Size([15000, 6])
[WARNING][nn.EquivariantModules] No cache directory provided. Nothing will be saved
[INFO][nn.EquivariantModules] Cache Loading Failed: No cache directory provided
[INFO][root] Initing EMLP (PyTorch)
[INFO][root] Vs[V4[d:24] ⋊ V4[d:128|inv:8]] cache miss
[INFO][root] Solving basis for Vs[V4[d:24] ⋊ V4[d:128|inv:8]], for G=V4[d:24] ⋊ V4[d:128|inv:8]
[INFO][groups.SemiDirectProduct] Solving equivariant basis using single generalized permutation matrix (3072, 3072)
3072 eigenvectors found: : 12288it [00:00, 837280.00it/s]                                                                                  
[INFO][root] Vs[V4[d:128|inv:8]] cache miss
[INFO][root] Solving basis for Vs[V4[d:128|inv:8]], for G=V4[d:128|inv:8]
[INFO][groups.SemiDirectProduct] Solving equivariant basis using single generalized permutation matrix (128, 128)
128 eigenvectors found: 100%|████████████████████████████████████████████████████████████████████████| 128/128 [00:00<00:00, 217180.79it/s]
/home/xx/Documents/proj/MorphoSymm-Replication/nn/EquivariantModules.py:119: RuntimeWarning: divide by zero encountered in scalar divide
  basis_coeff_variance = dim_out / lambd
Error executing job with overrides: ['dataset=com_momentum', 'robot_name=Solo-k4', 'model=emlp', 'dataset.train_ratio=0.7', 'model.lr=1e-5', 'dataset.augment=False']
Traceback (most recent call last):
  File "/home/xx/Documents/proj/MorphoSymm-Replication/train_supervised.py", line 249, in main
    model = get_model(cfg.model, Gin=train_dataset.Gin, Gout=train_dataset.Gout, cache_dir=cache_dir)
  File "/home/xx/Documents/proj/MorphoSymm-Replication/train_supervised.py", line 49, in get_model
    model = EMLP(rep_in=SparseRep(Gin), rep_out=SparseRep(Gout), hidden_group=Gout, num_layers=cfg.num_layers,
  File "/home/xx/Documents/proj/MorphoSymm-Replication/nn/EquivariantModules.py", line 342, in __init__
    layer = EquivariantBlock(rep_in=rep_inter_in, rep_out=rep_inter_out, with_bias=with_bias,
  File "/home/xx/Documents/proj/MorphoSymm-Replication/nn/EquivariantModules.py", line 170, in __init__
    self.linear = BasisLinear(rep_in, rep_out, with_bias)
  File "/home/xx/Documents/proj/MorphoSymm-Replication/nn/EquivariantModules.py", line 70, in __init__
    self.reset_parameters()
  File "/home/xx/Documents/proj/MorphoSymm-Replication/nn/EquivariantModules.py", line 138, in reset_parameters
    torch.nn.init.uniform_(self.basis_coeff, -bound, bound)
  File "/home/xx/miniconda3/envs/rss2023/lib/python3.10/site-packages/torch/nn/init.py", line 148, in uniform_
    return _no_grad_uniform_(tensor, a, b, generator)
  File "/home/xx/miniconda3/envs/rss2023/lib/python3.10/site-packages/torch/nn/init.py", line 15, in _no_grad_uniform_
    return tensor.uniform_(a, b, generator=generator)
RuntimeError: from is out of bounds for float

2. MLP Versions (C2 vs. K4): I noticed there are two versions of the MLP: C2 and K4. Could you explain the difference between them? My understanding is that the C2 and K4 properties are used as initialization methods—am I correct?

3. Request for Dataset and Checkpoints: Would it be possible for you to share the Solo robot dataset .npz file, as well as the checkpoints for MLP, MLP-Aug, and EMLP? Having access to these resources would greatly help me evaluate the results more effectively.

Thank you in advance for your guidance and support. I truly appreciate your time and effort in maintaining this excellent repository!

Best regards, Sizhe

Danfoa commented 3 days ago

Hi @SizheWei

Have a look at the discussion in #9, specifically the part:

So I think I found a solution. Please checkout to the new branch rss2023, in which I setup the old version of the code to work by setting up the appropriate conda env dependencies. Specifically by rolling back Scipy's version.

I have:

Introduced an additional conda_env.yaml file, so you have a notion of which python env I used to make the code rerun.

Commited the dataset partitioning used to compute the results. Read the paper appendix D.

Introduced a github submodule with the mild changes needed to run the experiments.

PS: There is a computer in which I might have the folder with the results, I will get back to you with this info this week. Again sorry for all the mess. In case I cannot find the files, the only option is to rerun the experiments, which will generate the output .csv used on the scripts to generate the plots of the paper.

Let me know if this helps.

The version of scipy is required to use the old deprecated algorithm for computing the basis of equivariant maps, which is where your error code comes from.

Regarding the version of MLP. There are MLP-Aug and EMLP in the paper results, each using different symmetry subgroups: C2 (sagittal reflection) and K4 (sagittal and dorsal reflections) of the Solo robot. So MLP-Aug C2 and K4 are MLPs using data augmentation using the entire morphological symmetry group K4 or the subgroup C2.

Similarly EMLP, is an equivariant NN w.r.t to C2 and K4 depending on the chosen group.

Regarding the datasets. The CoM momentum dataset is generated quite efficiently automatically using pinocchio when you run the training script for the first time. Regarding the training of the models, it should take only a handfull of minutes to train the NNs using the provided scripts. Try to give it a try using the rollback of scipy and let me know.

SizheWei commented 55 minutes ago

Hi @Danfoa ,

Sorry for the delay reply. And thank you for your reply, I read #9 , and I tried to pull your latest branch rss2023, and re-create the env. I met the error:

Traceback (most recent call last):
  File "/home/xxx/Documents/proj/new_morphosymm_original/MorphoSymm/train_supervised.py", line 244, in main
    datasets, dataloaders = get_datasets(cfg, device, root_path)
  File "/home/xxx/Documents/proj/new_morphosymm_original/MorphoSymm/train_supervised.py", line 100, in get_datasets
    robot, Gin_data, Gout_data, Gin_model, Gout_model, = get_robot_params(cfg.robot_name)
  File "/home/xxx/Documents/proj/new_morphosymm_original/MorphoSymm/utils/robot_utils.py", line 49, in get_robot_params
    from robots.solo.Solo12Bullet import Solo12Bullet
  File "/home/xxx/Documents/proj/new_morphosymm_original/MorphoSymm/robots/solo/Solo12Bullet.py", line 9, in <module>
    from pinocchio import Quaternion, Force, JointModelFreeFlyer
ModuleNotFoundError: No module named 'pinocchio'

ModuleNotFoundError, may I know your package version of pin ?

I tried to use the code below:

pip install pin==2.6.10
pip install numpy==1.24.4

After setting this, I met another importError:

from .pinocchio_pywrap import *
ImportError: libboost_python310.so.1.80.0: cannot open shared object file: No such file or directory

It seems like I can not use the old version spicy and pin. But I think there should be one combination that matches well.

Danfoa / MorphoSymm

Questions and Request for Assistance on Solo Robot COM Tasks #11