Camera action tokenization

etched-ai / open-oasis

Inference script for Oasis 500M

MIT License

1.53k stars 128 forks source link

Camera action tokenization #9

Closed jxiong21029 closed 3 weeks ago

jxiong21029 commented 3 weeks ago

Hello,

In the provided data, the camera actions have already been pre-quantized to integer values from 0-79 inclusive. How do raw mouse movements get converted to these values?

Here in the one_hot_actions method, the comment references another repository for the camera quantization, but it seems like that repository is not publicly available.

Thanks.

jxiong21029 commented 3 weeks ago

Did you use VPT foveated quantization with camera_maxval=20, camera_binsize=0.5, and mu somewhere between 2.3 and 3.1?

julian-q commented 3 weeks ago

Thank you for the question @jxiong21029 !

Yes, we used the VPT-style quantization. I've copied the function we used to quantize the VPT mouse deltas here:

def compress_mouse(dx):
    max_val = 20
    bin_size = 0.5
    mu = 2.7

    dx = np.clip(dx, -max_val, max_val)
    dx /= max_val
    v_encode = np.sign(dx) * (np.log(1.0 + mu * np.abs(dx)) / np.log(1.0 + mu))
    v_encode *= max_val
    dx = v_encode

    return np.round((dx + max_val) / bin_size).astype(np.int64)