Parskatt / DKM

[CVPR 2023] DKM: Dense Kernelized Feature Matching for Geometry Estimation
https://parskatt.github.io/DKM/
Other
385 stars 28 forks source link

Traning Code #1

Closed felipecadar closed 1 year ago

felipecadar commented 2 years ago

Congratulations on the great work! Do you have plans for releasing the training code ? I'm thinking on fine-tuning your weights for more specific tasks.

Parskatt commented 2 years ago

We do not plan to release the training code in the near future, however, if you have any questions regarding the training I'm happy to answer. Probably the training code release will be sometime after summer.

I'll leave this issue open in case others have the same question.

cvbird commented 2 years ago

Hi, parskatt. Thank you for the excellent work. I'm trying to train it from scratch. However, I still can not understand the loss functions. As mentioned in Sec 3.4, "The reference warps can come from projected depths like in (Sarlin et al., 2020; Sun et al., 2021) or from synthetic homographies, and the reference confidence p indicates, e.g., covisibility or consistent depth.", so

  1. What is the choice in this paper? depth or synthetic homographies ? It seems for different dataset, the strategy is different. If depth policy used, which method is used to generate the depth maps?
  2. If the reference confidence p indicates covisibility, is is a scalar or a tensor with the same shape as referenced warps ? how to obtain p ?
  3. would you please help to give a detailed illustration about the two loss functions L_swap and L_conf ?

Thank you again.

Parskatt commented 2 years ago

@cvbird

  1. In the current version (a new preprint will be released soon) we train either on only megadepth (like LoFTR), or on a combination of MegaDepth and a synthetic dataset (that we made ad-hoc). For megadepth the reference warp is constructed like in loftr (we use the warp_kpts function https://github.com/zju3dv/LoFTR/blob/2122156015b61fbb650e28b58a958e4d632b1058/src/loftr/utils/geometry.py#L5 with the input being created from a meshgrid to be dense). The depth is provided in megadepth, so no additional method is needed. Please refer to to the LoFTR (or perhaps even D2Net) for how to download and process megadepth. The synthetic dataset is very similar to the one presented in PDCNet or PDCNet+, although we have our own implementation. There there the flow is computed by a set of homographies and the confidence comes from covisibility (which is easy to compute given that we generate the warps).

  2. See above. Note that we found using the confidence loss exclusively from megadepth to yield better estimation results. I.e. we only use the confidence in the synthetic dataset to zero out the regression loss in certain regions.

  3. I don't remember L_swap, if you mean L_warp then it is simply done by computing warp_kpts with the grid as described above, and comparing it with the estimated dense warp. This gives an error for each pixel, we take the 2-norm in each pixel,multiply with the reference confidence(so as to remove the loss from non-matching pairs) and then take the mean. For the confidence loss we simply use binary cross entropy in each pixel.

cvbird commented 2 years ago

Thank you for your patient illustration. Looking forward to the new version. :)

KakueiTanaka commented 2 years ago

Hello, Parskatt. Could you tell me the size of the images used when training the model?

Parskatt commented 2 years ago

@KakueiTanaka Hi, we use height = 384 width=512 images, we will release preliminary training code, loss functions etc in a few days :)

KakueiTanaka commented 2 years ago

Thanks a lot!

Parskatt commented 2 years ago

@KakueiTanaka I updated the codebase now, and there is some (hopefully working) training code. Note that the code we provide here is adapted from our internal training framework that is quite messy, hence there might have been some translation errors. Let me know if there are some issues!

Parskatt commented 2 years ago

I realized that in our internal training we actually freeze the batchnorm statistics in the resnet backbone during training, which is not done here. This might cause some small discrepancies if trying to reproduce results. I'll make some updates to this code when I get back from vacation :)