ARISE-Initiative / robomimic

robomimic: A Modular Framework for Robot Learning from Demonstration
MIT License
592 stars 181 forks source link

Suggested reversion to 0.2 for specific data processing call in datasets.py #158

Closed balloch closed 3 months ago

balloch commented 3 months ago

Hey yall, In robomimic release 0.2 (and all patches before 0.3), in the SequenceDataset get_item method, observations used to be automatically processed, most notably channel-swapped, in the return of get_obs_sequence_from_demo :

        # prepare image observations from dataset
        return ObsUtils.process_obs_dict(obs)

In Release 0.3 (and subsequent updates) the processing has been change to simply return obs . Do we know why this happened?

I propose that this change is reverted back. In the demos in the release 0.3, the way around this has been to call

            # process batch for training
            input_batch = model.process_batch_for_training(batch)
            input_batch = model.postprocess_batch_for_training(input_batch, obs_normalization_stats=None)

In the training loop explicitly.

The problem is (Besides seeming a bit out of place as processing seems like something the DataLoader should do when/after sampling) also means that if anyone wants to use Robomimic's data infrastructure but not their algofactory/modeling classes, there is no convention of how to process the data (as those two methods are object methods).

This has come up for me when using LIBERO, which is based on the Robomimic data format, with a newer version of robomimic. Fundamentally, it may not be necessary for me to use the latest version of robomimic, but it seems best to.

Would it be possible to revert this change so that projects using Robomimic's data format but not its algo class can process data automatically? Or could you suggest how to best replicate the behavior of the new batch processing without an Algo object?

amandlek commented 3 months ago

Excellent question! We added a new API call here to ensure such operations take place on the GPU. This helps with things like image-based training, since we can transfer uint8 images to the GPU, and then convert to float, instead of the other way around (4x data transfer).

I would suggest to use this API call going forward. You could also explicitly check the robomimic version in an if statement for downstream code, to toggle behavior appropriately.

balloch commented 3 months ago

when you say "this API call," do you just mean only using Algo objects for interacting with the data?