Open gmmyung opened 4 months ago
I agree! Feel free to submit a pull request if you decide to do this. The original pytorch repo supports multi-modal inputs to some extent (see their definition of WorldModel._encoder
), but I think it would be nice to generalize this implementation to support user-specified network architectures for each observation type.
Are there any works on RGB image / multimodal input? It seems pretty straightforward to implement, I might work on it if there is no prior on this.