GrumpyZhou / patch2pix

Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]
MIT License
274 stars 25 forks source link

Creating custom dataset #8

Closed omeryasar closed 2 years ago

omeryasar commented 2 years ago

I want to create my own dataset to train with patch2pix. If I have fundemental matrix between source and destination images, would it be sufficent as supervision to train patch2pix ?

GrumpyZhou commented 2 years ago

Hi @omeryasar ,

Yes, if you have image pairs and the fundamental matrix of this pair, this is enough to compute the loss for training. The simplest way I see to train on a custom dataset is to follow dataset_megadepth.py and implement your own dataset class that gives the same type of outputs (see L120-127). The necessary data for training are:

data_dict = {
    'src_im': im_src, 
    'pos_im': im_pos, 
    'F': F
}
omeryasar commented 2 years ago

Hi @omeryasar ,

Yes, if you have image pairs and the fundamental matrix of this pair, this is enough to compute the loss for training. The simplest way I see to train on a custom dataset is to follow dataset_megadepth.py and implement your own dataset class that gives the same type of outputs (see L120-127). The necessary data for training are:

data_dict = {
    'src_im': im_src, 
    'pos_im': im_pos, 
    'F': F
}

Hey @GrumpyZhou thanks for quick reply! I am able to create my dataset and try to train it with patch2pix. I want to ask If triplet loss something crucial for performance and also S=16 (patch area.) At first try I could not set S=16 because of memory restrictions of the gpu that ı used and also ı could not use triplet loss because ı don't think that if I understood how to choose negative samples and it leads me to poor results. Any advices? Thanks in advance

GrumpyZhou commented 2 years ago

Hi @omeryasar ,

  1. The triplet loss was not important as it was mainly to jointly train ncnet and we did not notice this is helpful. This is also visible in the code that we only load pairs (See https://github.com/GrumpyZhou/patch2pix/blob/ad26ef065568eabf9a0bb6dc09f53462e9aeef36/train_patch2pix.py#L97).

  2. Patch size will influence the performance. So I remember I tried 8 /10/16 and it turns out 16 is the best, while the other settings will reduce the performance. Regarding your memory constraint, I would suggest to keep the patchsize as 16 and then reduce the training batch size. I can run locally with 12GB GPU if I set the batch size to 2 and it will still train at the cost of longer training time.

omeryasar commented 2 years ago

Hi @omeryasar ,

  1. The triplet loss was not important as it was mainly to jointly train ncnet and we did not notice this is helpful. This is also visible in the code that we only load pairs (See https://github.com/GrumpyZhou/patch2pix/blob/ad26ef065568eabf9a0bb6dc09f53462e9aeef36/train_patch2pix.py#L97 ).
  2. Patch size will influence the performance. So I remember I tried 8 /10/16 and it turns out 16 is the best, while the other settings will reduce the performance. Regarding your memory constraint, I would suggest to keep the patchsize as 16 and then reduce the training batch size. I can run locally with 12GB GPU if I set the batch size to 2 and it will still train at the cost of longer training time.

Thanks again @GrumpyZhou ! And one last question that ı have in mind, I am using some aerial images in resolution 5000x5000. Is there something specific in parameters I should tune because of higher resolution ? Because something going wrong in my case and I try to figure out. I had ~2000 pairs in my dataset and almost 1400 pairs is skipped during training. I could use a advice here if you have one. And thanks again to make your work public and for all of your answers

GrumpyZhou commented 2 years ago

Hi @omeryasar ,

Hm...5000x5000 is indeed quite big. In my experiments, I remember everything were resized too H=320 W=480..which will be too extreme downscale for aerial images. Is it possible to split the aerial image into 4 sub-regions and then merge results back? Another thing is 5000 is not divisible by 16, if this is your raw input to the network, then the network will randomly floor or ceil the decimal part. This could potentially cause failure during training. The 16 divisible is required by the NCNet and is how much the ResNet downscales your input.

omeryasar commented 2 years ago

Hi @omeryasar ,

Hm...5000x5000 is indeed quite big. In my experiments, I remember everything were resized too H=320 W=480..which will be too extreme downscale for aerial images. Is it possible to split the aerial image into 4 sub-regions and then merge results back? Another thing is 5000 is not divisible by 16, if this is your raw input to the network, then the network will randomly floor or ceil the decimal part. This could potentially cause failure during training. The 16 divisible is required by the NCNet and is how much the ResNet downscales your input.

@GrumpyZhou Ah ı forgot to mention that ı rescale them to 1024x1024. Do you think is it good idea to train resnet too since the data that ı am going to use have big difference that pretrained imagenet weights?

GrumpyZhou commented 2 years ago

Do you think is it good idea to train resnet too since the data that ı am going to use have big difference that pretrained imagenet weights?

@omeryasar , I don't know tbh. Maybe you can first try to apply a pre-trained model on your data to see whether it is totally not working and then decide whether you need to retrain from scratch.