How do you compress an action of a humanoid to an integer id in range [0, 1000]?

1x-technologies / 1xgpt

world modeling challenge for humanoid robots

Apache License 2.0

359 stars 30 forks source link

How do you compress an action of a humanoid to an integer id in range [0, 1000]? #4

Closed StarCycle closed 4 months ago

StarCycle commented 4 months ago

Hello,

I check the vector quantinized actions in the dataset. Surprisingly, an action is quantinized to only an integer id in range [0, 1000]. The compression ratio is quite high since a humanoid has so many joints...

How do you quantinized the actions? Do you train an action encoder-decoder like DeepMind's Genie (so the action in dataset is only a latent action and has no relation with real actions of humanoid joints)?

Best, StarCycle

kev-zhao commented 4 months ago

Hello StarCycle,

Our current method of quantizing actions is simply an autoencoder with FSQ applied to the latent space. We are discretizing the actual humanoid actions, not learning latent actions like GENIE. We are considering releasing the raw actions and should have more information within 2 weeks.

StarCycle commented 4 months ago

Hi @kev-zhao,

I see. But quantizing float number actions of so many joints to a singe integer between 0 and 1000 still sounds crazy...

Just curious: is it possible to train humanoid policy whose output is the latent action (i.e., the integer id), and then recover this latent action to continuous actions? What's the success rate of this policy?

A similar research is vqbet. It does compress the action but also predicts continuous number directly.

Looking forward to your response!

Guitaricet commented 4 months ago

I believe we don't use all joints for these actions, but a (relatively large) subset that is enough for both manipulation and navigation tasks. Plus, not all of the possible combinations of values are 1) physically possible 2) useful 3) present in the data, which makes the compression to 1000 values more realistic. We are looking at alternative quantizers though.

StarCycle commented 4 months ago

So some joint angles decoded from the action id may be not physical possible. That makes a lot of sense.

Would you like to compress joints to multiple integer ids, instead of a single one. And also use such action representation to train robotics policy?

StarCycle commented 4 months ago

Hello @Guitaricet @kev-zhao @ericjang ,

Would you like to try this as a new action tokenizer?

Guitaricet commented 4 months ago

Thank you for a reference! I believe the biggest difference here is that they use VQ-VAE instead of SFQ? I can take a deeper look, but I think SFQ should be similar quality as VQ-VAE, but much easier to train. At least this is the promise of the SFQ paper

StarCycle commented 4 months ago

Hi @Guitaricet,

I agree. I do like codebook-free algorithms like FSQ!

For SAQ, the core idea is to use state-conditioned action encoder and decoder which may compresses the actions better. Looking forward to your results!

─=≡Σ((( つ•̀ω•́)つ

ericjang commented 4 months ago

Thanks @StarCycle - we are releasing raw action data in the v1.0 release, so you can experiment a bit with the tokenization yourself. We've removed our own action tokens for now because we're not confident they are high quality enough (experiments on real robots suggest they are not great).