ARISE-Initiative / robomimic

robomimic: A Modular Framework for Robot Learning from Demonstration
MIT License
592 stars 181 forks source link

Robocasa Language Embedding Cuda Out of Memory Error #191

Open JacobB33 opened 3 weeks ago

JacobB33 commented 3 weeks ago

In line 222 of the Robocasa branch of robomimic/utils/train_utils.py, upon dataset creation, the dataset kwargs are deepcoppied. Since the language embedding model is one of the dataset_kwargs, this makes a copy of the model as well. This has caused me to run into a cuda out-of-memory issue when you train on a large number of dataset files. For example in Libero if you have 90 datasets, there are 90 copies of the language embedding model in cuda memory. I made a quick modification that fixed this problem:

for i in range(len(ds_weights)):     
        ds_kwargs_copy = deepcopy(ds_kwargs)
        # Change so that we do not run out of cuda memory
        if "lang_encoder" in ds_kwargs:
                ds_kwargs_copy["lang_encoder"] = ds_kwargs["lang_encoder"]

        keys = ["hdf5_path", "filter_by_attribute"]

        for k in keys:
            ds_kwargs_copy[k] = ds_kwargs[k][i]

        ds_kwargs_copy["dataset_lang"] = ds_langs[i]
        ds_list.append(ds_class(**ds_kwargs_copy))

Should I maybe make this a PR? It might be more efficient to pop the lang_encoder and then not copy it for every dataset (even though with the above fix it gets immediately deleted)