McGill-NLP / AURORA

Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
https://aurora-editing.github.io/
MIT License
10 stars 2 forks source link

prompt vs instruction #7

Closed relh closed 1 month ago

relh commented 1 month ago

It seems like the train.json files use 'instruction' and the valid.json files use 'prompt' to represent the text associated with the edit. Maybe you can edit all the json files to agree on a unified wording?

relh commented 1 month ago
(base)  relh@zephyr  ~/Code/AURORA/data/something   main ±✚  tail valid.json 
    "input": "data/something/frames_validation_2K/213259/first.jpg",
    "output": "data/something/frames_validation_2K/213259/last.jpg"
  },
  {
    "id": "213068",
    "prompt": "Add a hand that pushes the hat further to the right",
    "input": "data/something/frames_validation_2K/213068/first.jpg",
    "output": "data/something/frames_validation_2K/213068/last.jpg"
  }
]%                                                                                                                                                                                                                                                                                                                                              (base)  relh@zephyr  ~/Code/AURORA/data/something   main ±✚  tail train.json 
    "id": "117478",
    "instruction": "moving glass up",
    "template": "Moving [something] up",
    "placeholders": [
      "glass"
    ],
    "input": "../../change_descriptions/something-something/frames/117478/first.jpg",
    "output": "../../change_descriptions/something-something/frames/117478/last.jpg"
  }
]%  

Also all of these paths are mucked up (including in train_.json). For example train refers to weird relative paths and the folder something-something and frames whilst valid.json refers to just something and a folder that doesn't exist, frames_validation_2K (which I think got renamed to just frames/)

relh commented 1 month ago
(base)  relh@zephyr  ~/Code/AURORA/data/kubric   main ±✚  tail train.json 
    "input": "data/kubric/images/closer/5431/image0.jpg",
    "output": "data/kubric/images/closer/5431/image1.jpg",
    "instruction": "Move the black keyboard and the red flashlight closer together."
  },
  {
    "input": "data/kubric/images/rotate/12521/image0.jpg",
    "output": "data/kubric/images/rotate/12521/image1.jpg",
    "instruction": "turn the brown leather shoe around"
  }
]%                                                                                                                                                                                                                                                                                                                                              (base)  relh@zephyr  ~/Code/AURORA/data/kubric   main ±✚  tail valid.json 
    "input": "data/kubric/images/attribute/9885/image0.jpg",
    "output": "data/kubric/images/attribute/9885/image1.jpg",
    "prompt": "convert the golden-black high heel into a red high heel"
  },
  {
    "input": "data/kubric/images/further_location/6435/image0.jpg",
    "output": "data/kubric/images/further_location/6435/image1.jpg",
    "prompt": "move the red scissors further right"
  }
]%    

Here's an example of the change in schema between instruction and prompt between train.json and valid.json

relh commented 1 month ago

Also it's worth pointing out that it seems clevr doesn't have an associated output image:

    {
        "id": "025070",
        "prompt": "remove the tiny blue shiny object",
        "pos": "remove the tiny green shiny object",
        "input": "data/clevr/images/CLEVR_default_025070.png"
    }
]%   
BennoKrojer commented 1 month ago

Hi,

Thanks for identifying some of these smaller things I missed when releasing the repo! I tried to fix most of them (wording and also paths), and I deleted unnecessary files.

Maybe you noticed when reading the README that you have to create the Something Something frames yourself unfortunately since legally we wanted to be on the safe side. Let me know if there is any issues with that!

Re CLEVR, we ended up only using it for human eval without groundtruth and for the DiscEdit metric, since the original change prompts and image pairs were not that diverse and often quite ambiguous for the editing setup (e.g. "the cylinder changed its location"). I can elaborate more if you are interested. You can find image pairs of CLEVR_change online if that's what you are interested in.

Let me know if you need more help. Best, Benno