etched-ai / open-oasis

Inference script for Oasis 500M
MIT License
305 stars 22 forks source link

how does the sample_data work? #3

Open cocktailpeanut opened 1 day ago

cocktailpeanut commented 1 day ago

looks like you can only generate from the three videos in the sample_data folder as seed because it needs to parse the corresponding pt files.

Couldn't find any documentation about the pt files. Is there a way to generate the pt file given an mp4 file?

Also, is this model trained just for these minecraft videos, or can you do the same thing for any video?

julian-q commented 56 minutes ago

Great question @cocktailpeanut !

In fact, you can use any actions file with any prompt video, since the model allows for general controllability. Sorry the code doesn't make this very clear :P

While we only included three sample videos so far, you can experiment with downloading your own mp4, resizing it to be (360, 640) resolution, and changing just the mp4_path to point to that video.

Couldn't find any documentation about the pt files. Is there a way to generate the pt file given an mp4 file?

The videos and actions.pt files come from OpenAI's VPT dataset, after some preprocessing. You can get more action data from the data they collected, or you can even run their IDM model on an a Minecraft gameplay mp4. I can potentially include a conversion script to get VPT actions in the format our model uses.

Also, is this model trained just for these minecraft videos, or can you do the same thing for any video?

This model was trained on all of VPT, so it should work for a wide variety of Minecraft video prompts! (And you can try even using a single Minecraft image, since all you need is a single prompt frame.)