apple / ml-4m

4M: Massively Multimodal Masked Modeling
https://4m.epfl.ch
Apache License 2.0
1.57k stars 91 forks source link

Depth tokenizer #21

Open Shar-01 opened 2 months ago

Shar-01 commented 2 months ago

Hi everyone, thanks for the nice work. I am considering using your pretrained depth tokenizer to extract precomputed (features) tokens for further training. I have some questions.

  1. I cloned the ml-4m, and installed the diffusers library. However, get error: AttributeError: module diffusers.models has no attribute unet_2d_blocks. Could you please specify the requisites for using your repo and which diffuser version you have used?

  2. Also, how many tokens do we get from your pretrained checkpoint model?

  3. Is your uploaded pretrained depth tokenizer an encoder-decoder or encoder only model that would just give me the required tokens?

  4. What normalization did you use for the depth data?

Thanks a lot!