Can I input my instruction to synthesis 3d-scene?

We create instructions using templates based on the dataset information for training and quantitative evaluation: https://github.com/chenguolin/InstructScene/blob/d6950e929de77e26e07acbe5909269bc8252f827/src/train_sg.py#L230

However, you can provide your instructions during the inference by replacing texts, which is a list of text instructions: https://github.com/chenguolin/InstructScene/blob/d6950e929de77e26e07acbe5909269bc8252f827/src/generate_sg.py#L333

For example, in the stylization task, we construct different forms of instructions: https://github.com/chenguolin/InstructScene/blob/d6950e929de77e26e07acbe5909269bc8252f827/src/stylize_sg.py#L296

That is because we utilize the CLIP text encoder, which has been pretrained on a large-scale image-text dataset and demonstrates some degree of generalizability, to extract features from text instructions

If your instructions differ significantly from those used during training (e.g., Put/Position a xxx to the left/right side of a yyy), it would be more effective to retrain the model using your specific text-scene dataset.

chenguolin / InstructScene

Can I input my instruction to synthesis 3d-scene? #2