Comparisons w Tolan - Githubissues

JohnMBrandt commented 2 days ago

Hello,

I've been following your work and I'm impressed by everything you've been able to create with the SPOT imagery! I'm an author of the Tolan et al paper and had a few questions, as we're considering including the open canopy dataset into our next model training.

Did you test only a 3-band fine-tuning of the Vit-L from our paper? We're a little unsure how adding an additional band to a pre-trained model will be helpful, as it'll cause a large shift in the loss landscape.
Where are your config files for image normalization? I wasn't ablle to find them in your repository. What means and standard deviations did you use for the Tolan et al Vit-L?
Did you test upsampling spot to ~0.5m to test with the ViT-L, and then downsample the results back to 1.5m for a comparison? The ViT is quite sensitive to the size of its input patch, and taking a model trained on 0.5 m imagery and applying it to 1.5 meter imagery has not been successful for us. For instance, with 1m NAIP imagery, we upsample to 0.5m and get much better results.
What patch size did you train for the Tolan et al ViT-L? a quick glance at your config files suggests 224? If so, I'd recommend retraining with 256. The model we released was trained with 256x256 patches, and it is okay to interpolate that patches by an integer multiplication (e.g. 512x512), but we tested fine-tuning 224 x 224 and got artifacts.
Was your learning rate 1e-3 for all the ViT experiments with plain Adam optimizer? The tolan et al ViT-L was trained with AdamW, and can't be fine-tuned with Adam and no weight decay. I'd also suggest keeping the learning rate of 5e-4.

loicland commented 2 days ago

Hi Martin, thanks for your interest.

We evaluated all models with all four bands using the same protocol (keeping existing weights and initializing the new infrared channel weights with small values). This approach gave us the best results—see the table below. For UNet and PVTv2 (also in the table), we saw that infrared gives a boost as long as the model isn’t "over-trained." Larger models may struggle to escape their "RGB" local minima. Infrared is well-known in remote sensing for identifying tree species, which is key when estimating height/biomass.

3 & 4. Good tips! We didn’t upsample/resize the inputs, but we’ll try that in the next version of the paper, coming soon.

I'll let @fajwel handle the code-related questions.

Capture d'écran 2024-10-24 174524

fajwel commented 1 day ago

Hello John,

Thanks for your interest in our paper and your feedback.

For normalization, we used predefined values of 124 (mean and std), cf. https://github.com/fajwel/Open-Canopy/blob/da12917eac35ecdc90071e0bdedbc8353f0f9439/configs/data/canopy.yaml#L7 . We did not notice any change when using mean and std values learnt from the dataset on Unet and PVTV2, do you think it may have an impact for pretrained models such as Tolan?
Yes, we used a plain Adam optimizer. Thanks for the advice. We’ll try AdamW with weight decay and a smaller learning rate in the next version.

fajwel / Open-Canopy

Comparisons w Tolan #1