Closed StarCycle closed 4 months ago
Hello StarCycle,
Our current method of quantizing actions is simply an autoencoder with FSQ applied to the latent space. We are discretizing the actual humanoid actions, not learning latent actions like GENIE. We are considering releasing the raw actions and should have more information within 2 weeks.
Hi @kev-zhao,
I see. But quantizing float number actions of so many joints to a singe integer between 0 and 1000 still sounds crazy...
Just curious: is it possible to train humanoid policy whose output is the latent action (i.e., the integer id), and then recover this latent action to continuous actions? What's the success rate of this policy?
A similar research is vqbet. It does compress the action but also predicts continuous number directly.
Looking forward to your response!
I believe we don't use all joints for these actions, but a (relatively large) subset that is enough for both manipulation and navigation tasks. Plus, not all of the possible combinations of values are 1) physically possible 2) useful 3) present in the data, which makes the compression to 1000 values more realistic. We are looking at alternative quantizers though.
So some joint angles decoded from the action id may be not physical possible. That makes a lot of sense.
Would you like to compress joints to multiple integer ids, instead of a single one. And also use such action representation to train robotics policy?
Hello @Guitaricet @kev-zhao @ericjang ,
Would you like to try this as a new action tokenizer?
Thank you for a reference! I believe the biggest difference here is that they use VQ-VAE instead of SFQ? I can take a deeper look, but I think SFQ should be similar quality as VQ-VAE, but much easier to train. At least this is the promise of the SFQ paper
Hi @Guitaricet,
I agree. I do like codebook-free algorithms like FSQ!
For SAQ, the core idea is to use state-conditioned action encoder and decoder which may compresses the actions better. Looking forward to your results!
─=≡Σ((( つ•̀ω•́)つ
Thanks @StarCycle - we are releasing raw action data in the v1.0 release, so you can experiment a bit with the tokenization yourself. We've removed our own action tokens for now because we're not confident they are high quality enough (experiments on real robots suggest they are not great).
Hello,
I check the vector quantinized actions in the dataset. Surprisingly, an action is quantinized to only an integer id in range [0, 1000]. The compression ratio is quite high since a humanoid has so many joints...
How do you quantinized the actions? Do you train an action encoder-decoder like DeepMind's Genie (so the action in dataset is only a latent action and has no relation with real actions of humanoid joints)?
Best, StarCycle