about the cam parameter converge

I found the cam params (scale tx, ty) converge worse with or without the cam loss, and i use weak perspective in my code, in which kpy_2d = scale(kyp3d[, :2]) + txy. I think the key reason is the focal length of the freihand dataset is different with each image, and it range from 400 mm to 800 mm. So does the scale (focal / global_z) differs with each image. So maybe the network cannot regress the scale well? I want to how to make cam params converget better? is there any tricks in training? thanks!

gmntu / mobilehand

about the cam parameter converge #21