Ucas-HaoranWei / Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
1.65k stars 150 forks source link

请教 Vary-tiny 训练 loss 不降问题 #86

Closed 41passerby closed 3 months ago

41passerby commented 3 months ago

作者好,感谢分享 Vary 方案。

今天准备了一些 ocr 数据,依据说明训练 Vary-tiny ,发现 loss 一直降不下来,loss 维持 2.7 左右,请教下是什么问题导致,先行谢过。

基础模型使用的是 facebook/opt-125m。目前看数据样本挺正常的,如下截图,左边是文档图片,右边是 ocr 内容,每一行是一个段落。

image

数据格式:

[
    {
        "image": "output/ocr/images/9912538v1_1.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nProvide the OCR results of this image."
            },
            {
                "from": "gpt",
                "value": "An alternative description is to impose energy-momentum conservation, which means that in the hadron (parton) basis the charm meson (quark) is off-shell. The charm baryon (remnant) is external to the hard scattering process and should therefore be on-shell. (The remnant can be slightly off-shell if one allows it to pick up some energy in the hadronisation process.) This provides two model alternatives with energy conservation in the hadron or parton basis (ECH and ECP).\nOur results here and in [12, 19, 59, 96] are therefore only a first step towards a realistic model for intrinsic charm based on our general idea for deriving parton distributions.\n5 Conclusions and outlook\nWe have shown that our new model for parton momentum distributions in hadrons is reasonable and can reproduce the measured proton structure function. They can therefore be used as an alternative to the conventional parameterizations in practical calculations of hard processes and in Monte Carlo generators. The advantage of the physical model over the parameterizations is that it provides insights on the non-perturbative dynamics embodied in the model.\nThe application of the model to other hadrons than the proton will be presented elsewhere [74, 20], but the results cannot be as directly and precisely tested as in deep inelastic scattering on the nucleon. However, data on the pion structure are available from Drell-Yan processes in pion beam experiments.\n"
            }
        ]
    }
]

loss 不降的截图:

image
Ucas-HaoranWei commented 3 months ago

你的sam用的vary-tiny训练好的吗?

41passerby commented 3 months ago

你的sam用的vary-tiny训练好的吗?

不是的,sam 没有加载训练好的权重,直接初始化的,我想自己训练一个 sam 。train_opt 的目标也是训练一个用于 ocr 的 sam 视觉模块对吧?因为也想在某些其他领域训练 sam 视觉模块,所以想学习下,感谢作者回复,麻烦再指导下,谢谢。

lucasjinreal commented 3 months ago

同样无法复现,我的loss倒是可以降低到1.8,但是还不够。测试中间checkpoint的输出只能学到图片的pattern,学不到具体的文字

lucasjinreal commented 3 months ago

@41passerby 请问你的问题解决了?