Open3DA / LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
https://ll3da.github.io/
MIT License
248 stars 9 forks source link

densecap任务 #24

Closed xjj1999 closed 3 months ago

xjj1999 commented 3 months ago

请问在训练过程中,作者有碰到对于densecap任务不同的val样例,llm输出个固定格式,内容相似的答案的现象吗

ch3cook-fdu commented 3 months ago

I do not quite understand your question, could you provide me with some examples?

xjj1999 commented 3 months ago

20240725-185406

xjj1999 commented 3 months ago

比如说训练了10个epoch之后,验证集结果如图,llm对不同问题的输出即response_pred基本一致。这里只截取了部分,几乎整个验证集的输出答案都是这个格式和内容。

ch3cook-fdu commented 3 months ago

If you are training with ScanRefer data only, it might be normal.

xjj1999 commented 3 months ago

谢谢!我确实是只用了scanrefer数据集。我尝试下多数据集联合训练

xjj1999 commented 3 months ago

May I ask if this phenomenon disappears naturally during joint training on multiple datasets?

ch3cook-fdu commented 3 months ago

If you feed diverse data on more tasks, this might be alleviated.

xjj1999 commented 3 months ago

请问方便提供densecap任务训练过程中验证集的评测指标变化曲线吗?

ch3cook-fdu commented 3 months ago

scanrefer-opt-1.3b-logger.log Here is the log for ScanRefer fine-tuning.

xjj1999 commented 3 months ago

请问unified_scanrefer是如何实现的,从日志上看batch=16,一个epoch为27504,远大于scanrefer数据集的数据量

ch3cook-fdu commented 3 months ago

27504 is the total iterations for 12 epochs.

xjj1999 commented 3 months ago

感谢!

xjj1999 commented 3 months ago

还有个问题是从日志上看,模型在一阶段训练时在densecap上就有了不错的性能,请问方便提供一阶段训练时的验证集指标吗?或者说需要大概多少次迭代,模型能具备相关能力。

ch3cook-fdu commented 3 months ago

We show the corresponding metrics in Table 5 of our paper. Please follow the pre-training guide provided in Readme.

xjj1999 commented 3 months ago

感谢回答,问题已经解决了。我想进一步复现补充材料中Open-Vocabulary实验,请问方便提供ovdet微调后的权重吗