beanie00 / Decision-ConvFormer

[ICLR 2024 Spotlight] Code for the paper "Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making"
https://www.beanie00.com/publications/dc
MIT License
8 stars 1 forks source link

About the convolution operation in DC algorithm #3

Closed Liuxueyi closed 1 week ago

Liuxueyi commented 3 months ago

Hello! Thank you for releasing the code. There is a question when I read the code. The three blocks concluded the convolution layer are sequenltial and process the states, actions and returns concurrently, which are designed for different modality data respectively in the paper. Could you please explain how the convolution layers capture the MDP features in more detail?

beanie00 commented 4 weeks ago

Hi Liuxueyi, Thank you for your interest in our research, and I apologize for the delayed response.

The convolution layers in DC are designed to capture the relationships between each modality (RTG, state, action) and the prior tokens (RTG, state, action) across timesteps using learned convolution filters. Since the dependencies of RTG, state, and action tokens on prior RTG, state, and action tokens may differ, DC uses three distinct filters—RTG filter, state filter, and action filter—to model these relationships independently. These filters allow the model to capture modality-specific patterns in how each token depends on its history, considering not only its own previous tokens but also those of other modalities. While these filters are applied separately for each modality, their outputs are not independent. After the convolution operation, the outputs from all three filters are combined in later layers, allowing the model to learn the interactions and dependencies between different modalities over time. This design not only captures local temporal relationships within each modality but also enables DC to understand how these modalities influence one another, leading to more informed and coordinated decision-making.