Closed DanielChaseButterfield closed 3 weeks ago
Hi @DanielChaseButterfield,
Indeed the current state of the repository has diverged largely from the state of the RSS publication time. I apologize for not paying attention on the reproducibility of the experiments. If you give me a couple of days I will try to bring them back to operation.
@Danfoa No worries; thanks so much for the help!
If it's any use, I have some trivial fixes completed in my own fork (https://github.com/lunarlab-gatech/MorphoSymm), and I can open a pull request to a development branch or a new branch if you'd think that would save you some time.
Hi @DanielChaseButterfield,
Question: Are you interested in the contact estimation or the CoM-momentum regression experiment?
From the time of the RSS publication I migrated the Equiv-NN backed from EMLP
to ESCNN
. The CoM-momentum regression is quite simply adapted to the new backend, but in the new backend I would have to define the equiv version of the contact-CNN, this might take some time.
Let me know which experiment is of interest or if both interest you.
We're only comparing against the contact-CNN, which unfortunately sounds to be the more difficult of the two; but yeah we aren't planning on comparing against the CoM-momentum regression.
I'm sure redefining the contact-CNN for the new code would be difficult; is it possible that one of the commits from before the time of RSS publication would work? I feel like that might save you some time.
@Danfoa Another option that could potentially reduce your workload. My main purpose in replicating the experiment was to do the following two things:
If it's difficult to reimplement the contact estimation experiment, then directly providing the trained model metrics (that were used to generate these figures) and the number of parameters in the ECNN model would be enough for our purposes.
The repository seems to provide the trained model metrics for the COM experiment in "paper/experiments/com_sample_eff_Solo-K4-C2", so I figured that you might have the trained model metrics for the contact experiment saved somewhere else.
I've stepped backwards through the commit history of this repository, and found out certain import commits that added back features that the contact estimation experiment depended on:
Operating on commit e702fac, and by changing a few deprecated numpy values to their corresponding python versions (np.int to int, for example), I was able to get a new error output by the code:
python train_supervised.py --multirun robot=mini_cheetah-c2 dataset=contact dataset.data_folder=training_splitted dataset.train_ratio=0.85 dataset.augment=False exp_name=contact_sample_eff_splitted model=contact_ecnn model.lr=1e-5
pybullet build time: Nov 28 2023 23:52:03
/home/dbutterfield3/Research/MorphoSymm/train_supervised.py:205: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path='cfg/supervised', config_name='config')
[2024-07-30 12:09:34,643][HYDRA] Launching 1 jobs locally
[2024-07-30 12:09:34,643][HYDRA] #0 : robot=mini_cheetah-c2 dataset=contact dataset.data_folder=training_splitted dataset.train_ratio=0.85 dataset.augment=False exp_name=contact_sample_eff_splitted model=contact_ecnn model.lr=1e-05
/home/dbutterfield3/miniconda3/envs/morph_training/lib/python3.9/site-packages/hydra/_internal/core_plugins/basic_launcher.py:74: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
[INFO][__main__]
NEW RUN
Seed set to 309
Contact Dataset path:
- Data: /home/dbutterfield3/Research/MorphoSymm/datasets/contact_dataset/training_splitted/numpy_train_ratio=0.850/train.npy
- Labels: /home/dbutterfield3/Research/MorphoSymm/datasets/contact_dataset/training_splitted/numpy_train_ratio=0.850/train_label.npy
Contact Dataset path:
- Data: /home/dbutterfield3/Research/MorphoSymm/datasets/contact_dataset/training_splitted/numpy_train_ratio=0.850/val.npy
- Labels: /home/dbutterfield3/Research/MorphoSymm/datasets/contact_dataset/training_splitted/numpy_train_ratio=0.850/val_label.npy
Contact Dataset path:
- Data: /home/dbutterfield3/Research/MorphoSymm/datasets/contact_dataset/training_splitted/numpy_train_ratio=0.850/test.npy
- Labels: /home/dbutterfield3/Research/MorphoSymm/datasets/contact_dataset/training_splitted/numpy_train_ratio=0.850/test_label.npy
[WARNING][nn.EquivariantModules] No cache directory provided. Nothing will be saved
[INFO][nn.EquivariantModules] Cache Loading Failed: No cache directory provided
[INFO][root] ρ(C2[d:54|inv:3] ⋊ C2[d:64|inv:6]) cache miss
[INFO][root] Solving basis for ρ(C2[d:54|inv:3] ⋊ C2[d:64|inv:6]), for G=C2[d:54|inv:3] ⋊ C2[d:64|inv:6]
[INFO][groups.SparseRepresentation] Solving equivariant basis using single generalized permutation matrix (3456, 3456)
3456 eigenvectors found: 100%|█████████████| 3456/3456 [00:00<00:00, 208313.78it/s]
[INFO][root] ρ(C2[d:64|inv:6]) cache miss
[INFO][root] Solving basis for ρ(C2[d:64|inv:6]), for G=C2[d:64|inv:6]
[INFO][groups.SparseRepresentation] Solving equivariant basis using single generalized permutation matrix (64, 64)
64 eigenvectors found: 100%|███████████████████| 64/64 [00:00<00:00, 160932.53it/s]
Error executing job with overrides: ['robot=mini_cheetah-c2', 'dataset=contact', 'dataset.data_folder=training_splitted', 'dataset.train_ratio=0.85', 'dataset.augment=False', 'exp_name=contact_sample_eff_splitted', 'model=contact_ecnn', 'model.lr=1e-05']
Traceback (most recent call last):
File "/home/dbutterfield3/Research/MorphoSymm/train_supervised.py", line 248, in main
model = get_model(cfg.model, rep_in=train_dataset.rep_in, rep_out=train_dataset.rep_out, cache_dir=cache_dir)
File "/home/dbutterfield3/Research/MorphoSymm/train_supervised.py", line 43, in get_model
model = ContactECNN(rep_in, rep_out, cache_dir=cache_dir, dropout=cfg.dropout,
File "/home/dbutterfield3/Research/MorphoSymm/nn/ContactECNN.py", line 60, in __init__
BasisConv1d(rep_in=self.rep_in, rep_out=rep_ch_64_1, kernel_size=3, stride=1, padding=1, bias=bias),
File "/home/dbutterfield3/Research/MorphoSymm/nn/EquivariantModules.py", line 208, in __init__
EquivariantModel.test_module_equivariance(module=self, rep_in=self.rep_in, rep_out=self.rep_out,
File "/home/dbutterfield3/Research/MorphoSymm/nn/EquivariantModules.py", line 387, in test_module_equivariance
raise RuntimeError(f"{module}\nis not equivariant to in/out group generators\n"
RuntimeError: E-Conv1D G[C2[d:54|inv:3] ⋊ C2[d:64|inv:6]]-W3456-Wtrain:3456=100.0%-init_std:0.239
is not equivariant to in/out group generators
max(f(g·x) - g·y) = 9.143681526184082
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Currently working on figuring out if I should step further back in the commit history, or if I should try and debug this on this commit.
@Danfoa Do you know what could be cause the RuntimeError: E-Conv1D G[C2[d:54|inv:3] ⋊ C2[d:64|inv:6]]-W3456-Wtrain:3456=100.0%-init_std:0.239 is not equivariant to in/out group generators
error?
Looks like, as could probably be expected, the RuntimeError: E-Conv1D G[C2[d:54|inv:3] ⋊ C2[d:64|inv:6]]-W3456-Wtrain:3456=100.0%-init_std:0.239 is not equivariant to in/out group generators
error does indeed mean that the convolutional layer is not equivariant with respect to the input and output representations.
I'm wondering if this is because I have different versions of dependencies, which is silently causing my mathmatical calculations to be off. Or, I wonder if I'm simply at a commit halfway between the time of the experimental evaluation and the updated code, where some of the internal test cases don't pass. I'm planning on stepping back further to see if that might be the case.
Okay, I've stepped back further, and it seems that the above error carries all the way to commit d2c6505, if not further. So it seems like the issue must be some sort of silent dependency error, like perhaps an updated version of some library that I installed behaves differently than a couple of years ago.
I know my numpy version isn't the one that was used at the time of development; as I've needed to replace references of np.int
with int
and so on (which was removed in numpy 1.26). However, I was unable to get the python library to build with an older version of numpy, so I manually changed those references. I wonder if something similar is silently failing, like maybe emlp
or escnn
have changed.
Hi @DanielChaseButterfield,
So I think I found a solution. Please checkout to the new branch rss2023
, in which I setup the old version of the code to work by setting up the appropriate conda env dependencies. Specifically by rolling back Scipy's version.
I have:
PS: There is a computer in which I might have the folder with the results, I will get back to you with this info this week. Again sorry for all the mess. In case I cannot find the files, the only option is to rerun the experiments, which will generate the output .csv used on the scripts to generate the plots of the paper.
Let me know if this helps.
Sorry for the delay, I wanted to make sure that I could run your changes on my computer. I took your new rss2023
branch and made a few changes to resolve pip dependency conflicts, add a couple missing libraries, and fix import errors. We now have our fork here (https://github.com/lunarlab-gatech/MorphoSymm).
@Danfoa The Scipy rollback was a lifesaver; I am no longer getting the RuntimeError: E-Conv1D G[C2[d:54|inv:3] ⋊ C2[d:64|inv:6]]-W3456-Wtrain:3456=100.0%-init_std:0.239 is not equivariant to in/out group generators
error! Additionally, the conda_env.yml
file was quite useful for installing everything quickly. Thanks so much for your update; I'm now able to train your models for our paper! Additionally, I can generate Figure 4-Right from On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis.
However, I do have a couple more issues. There seem to be multiple files for generating Figure 4-Left & Center:
@Danfoa Do you know which one of these I should use?
Additionally, when I run sample_efficiency_figures_contact_CNN-ECNN.py
, I get an empty graph with no data. Note that when I ran it, I had six models trained using the debug tools (two of each type), so I have some .csv files from which I can plot results.
I looked into the train_supervised.py
file, and although the COM_Momentum dataset appears to have a way of specifying different sample numbers, the Contact Estimation experiment doesn't seem to have this capability currently.
@Danfoa Did you simply manually edit the dataset partition files based on how many samples you needed, and if so, how does the plotting file know how many samples each run had?
Hi @DanielChaseButterfield,
First of all, huge thanks on the PR :). I am very happy to welcome you as a collaborator of the repo.
Answering some of your questions:
I looked into the train_supervised.py file, and although the COM_Momentum dataset appears to have a way of specifying different sample numbers, the Contact Estimation experiment doesn't seem to have this capability currently.
The CoM dataset has a num_samples attributes because it is a synthetic dataset and we can control the number of data points (num_samples=num_train_samples + num_test_samples + num_val_samples). While the Umich contact dataset is a real-world dataset, for which the number of data points is fixed.
@Danfoa Did you simply manually edit the dataset partition files based on how many samples you needed, and if so, how does the plotting file know how many samples each run had?
No these are all "automatically" generated. So the total number of samples in the dataset is partitioned in train_ratio (%), test_ratio(%), val_ratio(%). As far as I recall and can see from the code, the val ratio and test ratio are always set to 15% of the dataset samples. While the train_ratio is controlled as a parameter of the training script in order to test model performance under different number of training samples. I.e. in order to test the model using TP=85,70,50,30,10 [%]
of the train+val samples for training you would do smth like:
python train_supervised.py --multirun dataset=contact dataset.train_ratio=0.85,0.7,0.5,0.3,0.1 model.lr=1e-5 dataset.augment=False [... other params]
We keep the same samples for validation and testing for all models, since we want to compare to the same test/val "data".
The logic for the partitions is given here:
This function is used here:
To generate the training and validations splits from the same "trajectory data", while the testing set is generated from "different trajectory data". As explained in appendix D2:
You can check the partitioning here MorphoSymm/datasets/umich_contact /training_splitted/
the mat
folder contains the original Umich dataset recordings used for training/val in my paper. Meanwhile, the mat_test
is the recording used during testing.
However, I do have a couple more issues. There seem to be multiple files for generating Figure 4-Left & Center:
sample_efficiency_figures_contact_CNN-ECNN.py sample_efficiency_figures_contact_ECNN.py sample_efficiency_figures.py
I believe the script is sample_efficiency_figures_contact_CNN-ECNN.py
. If you get an empty plot is because the filtering of metrics is reducing some of the results. Beware this code is quite hacky; for each plot, it requires you to change the filtered metrics. However since its simply a matplotlib plot, the only thing you need in practice is the .csv files with the metrics.
Again sorry for the messy/hacky code and thanks for the contributions, much appreciated. I improved a lot since those days.
Hi @DanielChaseButterfield,
I saw the ICRA submission :). Congratulations!.
After the ICLR deadline, I will update the repository, improve the overall reproducibility of experiments and have a list of works built using MorphoSymm. If it is ok with you I will list your work also.
That being said I proceed to close this issue.
I'm trying to replicate the results of "On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis (RSS-2023)". Unfortunately, the provided code on the main branch has multiple issues.
I was able to rectify many of them on my own (like fixing changed path names, typos in files, etc) on my own fork, but I've recently run into an error involving core functionality of the algorithm that I don't fully understand.
After running the following command:
I get this result:
The error is ultimately due to a mismatch between return arguments for
load_symmetric_system()
inutils/robot_utils.py
and the expected returns fromget_in_out_symmetry_groups_reps()
in thedata/contact_dataset/umich_contact_dataset.py
:I don't fully understand the intricacies of this code, so I don't want to just remove expected return values.
This leads me to the main request for this issue. My goal in editing the code was simply to replicate the results of the paper "On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis (RSS-2023)", but clearly the main branch has long since diverged from the code that was run for the paper. Could you provide the commit that contains the code that was run for this paper?