About stage 3 - Githubissues

The "stage 3" in code was inherited from Chat-3D v1, which was used for instruction tuning on multi-turn conversations. In Chat-3D v2, we did not conduct experiments on multi-turn conversations, so you can just ignore the "stage 3" code.

For finetuning with llama activated, we simply activate the last few transformer layers for tuning, which you can refer to this code snippet. You can try the difference between activating and deactivating llama. Maybe it's better to finetune it with LoRA. In my experience, the performance does not improve with llama activated for now. I think it's because the amount of data for 3D-scene pairs is far from enough to achieve good alignment before tuning llama.

ZzZZCHS / Chat-Scene

About stage 3 #20