Closed hmmdxzz closed 6 months ago
We create instructions using templates based on the dataset information for training and quantitative evaluation: https://github.com/chenguolin/InstructScene/blob/d6950e929de77e26e07acbe5909269bc8252f827/src/train_sg.py#L230
However, you can provide your instructions during the inference by replacing texts
, which is a list of text instructions:
https://github.com/chenguolin/InstructScene/blob/d6950e929de77e26e07acbe5909269bc8252f827/src/generate_sg.py#L333
For example, in the stylization task, we construct different forms of instructions: https://github.com/chenguolin/InstructScene/blob/d6950e929de77e26e07acbe5909269bc8252f827/src/stylize_sg.py#L296
That is because we utilize the CLIP text encoder, which has been pretrained on a large-scale image-text dataset and demonstrates some degree of generalizability, to extract features from text instructions
If your instructions differ significantly from those used during training (e.g., Put/Position a xxx to the left/right side of a yyy
), it would be more effective to retrain the model using your specific text-scene dataset.
Can instruction be generated only from datasets? And can I generate and visualize 3D scenes by inputing my instruction? 代码里通过分析布局中的家具,以位置关系生成指令,我该怎么输入自己的位置关系描述来生成场景?