Parskatt / DKM

[CVPR 2023] DKM: Dense Kernelized Feature Matching for Geometry Estimation
https://parskatt.github.io/DKM/
Other
378 stars 28 forks source link

It is fair to compare DKM directly with LoFTR when DKM uses more training data sampled from MegaDepth ? #15

Closed noone-code closed 1 year ago

noone-code commented 1 year ago

LoFTR uses only 15,300=153*100 image training pairs from MegaDepth. While DKM iteratively sample 150,000 pairs each iteration from 10,661,614 total traning pairs in 53 iterations.

Parskatt commented 1 year ago

Hi!

I think you are mistaken regarding the number of training pairs in LoFTR. Could you tell me where you get the number 100 from? From my understanding they are using the pairs in here: https://drive.google.com/drive/folders/1SrIn9WJ1IuG08yh2nEvIsLftXHLrrIwh

And they load those npz files into this dataset loader: https://github.com/zju3dv/LoFTR/blob/master/src/datasets/megadepth.py

Running the following code:

sampled_pair_files = [f for f in open("trainvaltest_list/train_list.txt","r").read().split("\n") if len(f) > 0]
num_pairs = 0
for scene_name in sampled_pair_files:
    scene = np.load(f"scene_info_0.1_0.7/{scene_name}.npz",allow_pickle=True)
    scene_pairs = len(scene['pair_infos'])
    num_pairs = num_pairs + scene_pairs
print(num_pairs)

yields 8862673. So they have around 9 million unique pairs. For Scannet they use the same procedure as SuperGlue and end up with 240M pairs.

The main reason that we don't follow this exact procedure is that:

  1. We believe our approach is more modular, and easier to modify for someone looking to improve the sampling, or put focus on certain overlaps.
  2. More transparent in how the pairs are sampled exactly.

However, we did not find that our sampling procedure produces better results after training than the original version on the benchmarks.

noone-code commented 1 year ago

Well, take it easy. I just wanna to explore the influence of training data numbers on performance. I think you did not realize the n_samples_per_subset parameters in the sampler.py, it is set to 100 in the megadepth datasets. So, it is exactly the LoFTR only use the 15300 pairs of images.

Parskatt commented 1 year ago

Hi again! I hadn't seen that detail before :)

From reading their implementation: https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py#L5 it says the following:

Random sampler for ConcatDataset. At each epoch, n_samples_per_subset samples will be draw from each subset in the ConcatDataset.

I'm guessing they run more than 1 epoch? Hence the correct number should be 368 100 num_epochs?

noone-code commented 1 year ago

In fact, they will not sample each epoch, as the reload_dataloaders_every_epoch=False in the train.py. If reload_dataloaders_every_epoch=True, the sampler will resample each epoch.

Parskatt commented 1 year ago

Aha, got it. However, they also use 64 GPUs which I guess means that each GPU gets its own sampler?

Parskatt commented 1 year ago

My general guess is that they found that the exact specifics of the sampling was not very important for the final performance?

noone-code commented 1 year ago

I don’t know why they do not resample training data each epoch, it is better to ask the author. While for the sampling, as self.generator = torch.manual_seed(seed) is fix the generator, hence it fix the sample results, and the sampled indices will uniformly assign to each gpu. Even each gpu sample it by self, while as the generator is fixed, so they still get the same sample indices.

Parskatt commented 1 year ago

I find it hard to believe that each GPU would sample the exact same indices, but I'm not completely familiar with their exact sampling. I'll run their code to get a better understanding.

I'll get back to you after I have done this so that we can have a more informed discussion.

noone-code commented 1 year ago

Yep, anyway, I think DKM is a good method as its impressive results, I like it. One more question, could I use DKM on 1920*1080 images when I training it on other size like, 520*720 ?

Parskatt commented 1 year ago

Yes. We don't have a perfectly clean way of doing it but there are two alternatives:

  1. Set the internal dimensions (we always resize to a fixed size, so you can change this resolution to your desire, note however that the method may become quite slow for large images)
  2. Keep the internal dimension to (540,720) but upsample the prediction by rerunning the final layer

In the model zoo: https://github.com/Parskatt/DKM/blob/5b28266cff1bd55f20dade6706afed65f2e158af/dkm/models/model_zoo/__init__.py#L18-L26

you can see some api for changing these variables. However, right now the "upsample_preds" variable is autoset to use (864,1152) see:

https://github.com/Parskatt/DKM/blob/5b28266cff1bd55f20dade6706afed65f2e158af/dkm/models/dkm.py#L685

You can change this hardcoding, so that its settable. If you were to change it, please submit a pullrequest so I can update the code. There is a lot of mess so I'd appreciate it.

noone-code commented 1 year ago

Wow, thank you so much. My questions are addressed. Thank you. 🌹

Parskatt commented 1 year ago

@noone-code Ok I ran LoFTRs training code and here's how I think it works:

They use RandomConcatSampler : https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py

When using distributed they split the scenes over the GPUS, each GPU gets 384//world_size scenes. The GPUS are initialized with a seeded generator. Their iter method, defined here:

https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py#L44

Samples 100 pairs for each scene and shuffles them. This defines 1 epoch for each worker.

The next epoch this is done again. Note that the module is not reinitizalized. Hence the state of the generator is different from the first epoch. Therefore, the 100 pairs the second epoch will be different from the first.

I think I got this correct, went through their code by debugging the train.py function on two GPUs using their standard outdoor training setting. However I have not actually ran a full epoch yet, so I might have misunderstood something. However, if you look at their comment: https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py#L15

This leads me to believe that they are aware of the potential issue with repeated samples, and therefore make sure not to reinitialize it.

Please let me know if I got anything wrong, it might be the case that I misunderstood something.

noone-code commented 1 year ago

I find the code local_npz_names = get_local_split(npz_names, self.world_size, self.rank, self.seed) I agree that LoFTR assign different scenes to each gpu, and each gpu will sample 100 pairs images at each epochs. So, actually, the LoFTR uses upper to 384(scenes)*100(samples each scene)*30(epoch)=1,152,000 pairs images ? Is it correct ?

Parskatt commented 1 year ago

I think so! (but not sure)

There is some potential that our sampling may yield slightly better (or worse) results compared to theirs if used in DKM. Of course, our method has been developed using our sampling and theirs with theirs, so it might be the case that both would degrade using the others sampling ;)

In conclusion, I would say that since they do sample quite a lot of pairs, they are comparable to us, however it would of course be interesting to investigate a bit more deeply how to sample good pairs for training of feature matchers.

noone-code commented 1 year ago

Yes, finally, thank you so much.