Lydorn / mapalignment

Aligning and Updating Cadaster Maps with Remote Sensing Images
Other
70 stars 13 forks source link

disp_max_abs_value & model_disp_max_abs_value #4

Closed xuyingxiao closed 5 years ago

xuyingxiao commented 5 years ago

Hi Lydorn, I noticed that the paramater "disp_max_abs_value"=32 in config.test.bradbury_buildings.json , while the model "disp_max_abs_value" =4, with training max displacements up to 4 px, and test max displacements up to 32 px? I can't understand the relationship between these two parameters, but it performs 32px displacements well. Or you pretrained model's max displacements pixel is 32? What does this sentence mean in page 9? Random dropping of input polygons. With displacements of only up to 4px, it could be easy for the network to keep a small error by outputting, as a segmentation, just a copy of the input polygon raster A 2 . If I want to align more displacement offset, just increase the parameter "disp_max_abs_value" in config.json and retraining?

Lydorn commented 5 years ago

Hi xuyingxiao, Those two parameters do not mean the exact same thing (although them having the same name is surely confusing). In config.test.bradbury_buildings.json, disp_max_abs_value=32 means that artificial displacements of up to 32px in amplitude will be generated on the original annotations in order to test the model. In config.json, disp_max_abs_value=4 means that each neural network in the multi-resolution model can output displacements up to 4px in amplitude. Because the whole model is made up of a chain of 4 neural networks that perform 4px alignment at different resolutions in succession, the overall model can handle displacements up to 32px (the smallest resolution in the multi-resolution pipeline is the original resolution divided by 8, thus a 4px displacement in that downsampled resolution translates into a 8*4px=32px displacement in the original resolution). I hope this explanation clears up any confusion. Are your other questions about the sentence in page 9 and how to increase the displacement offset to align related to this confusion or are they unrelated questions?

xuyingxiao commented 5 years ago

Thanks, I thought you trained the model with max displacements only up to 4px and test with 32px. So,disp_max_abs_value=4 in config.json means that model ds_fac_8 with up to 4px displacements and model ds_fac_1 with up to 32px displacements,is that right? My problem is to align the polygons with more displacement. If I increase the parameter "disp_max_abs_value" in config.json and retraining these datasets, it can solve this problem?

Lydorn commented 5 years ago

Actually, all networks ds_fac_8, ds_fac_4, ds_fac_2 and ds_fac_1 are trained with a max disp of 4px. However, because the inputs are downscaled by ds_fac (equal 8, 4, 2 and 1 respectively) before feeding them to the networks, the network ds_fac_8 actually performs alignement of 32px once it is scaled back to the original resolution (but it does an imprecise alignement, which is why there are 3 successive networks after that to refine the alignment). The whole model basically uses a pyramid representation to perform alignment at different resolutions.

If you want to align with more displacement, there are 2 solutions:

xuyingxiao commented 5 years ago

Thank you for your reply!Your work helps me a lot. And I have two more questions.

  1. Actually, typical U-net networks with the same input and output size, while you output 100px100px with a 220220px input, does this way works better that the same size because the center part has a better align performance?
  2. You generate normalized 2D Gaussian random fields added together for each coordinate. What if the input displacement field map=0 where there are no polygons? Will it leads to something worse or hard to train?
Lydorn commented 5 years ago
  1. Yes typical U-Nets use padding for every conv layer so that the output size is the same as the input. However I decided to not use padding to avoid boundary artifacts which results in a lower output size. I account for this when applying the network to a whole image by extracting overlapping patches whose overlap matches the difference in input and output size. By not using padding I make sure every pixel in the output has the same context regardless of it being near the boundary or not.
  2. Actually with the default hyperparameters this will not change anything to the training because the loss is not backpropagated through the background for the displacement output. Implicitly the loss coefficient for background pixels is 0. However if the laplacian_penalty_coef hyperparameter in config.json is set to a value different from zero, background displacements will contribute to the loss. This hyperparameter controls the Laplacien regularization which aims to penalize displacements that are not smooth. We ended up not using it.
xuyingxiao commented 5 years ago

Thank you again for your reply. It's a very meaningful job. I‘ll reading it carefully. :)

Lydorn commented 5 years ago

Thank you again for your reply. It's a very meaningful job. I‘ll reading it carefully. :)

Well of course if you have any further questions I'd be happy to answer them as well!