Open JohnMBrandt opened 2 days ago
Hi Martin, thanks for your interest.
3 & 4. Good tips! We didn’t upsample/resize the inputs, but we’ll try that in the next version of the paper, coming soon.
I'll let @fajwel handle the code-related questions.
Hello John,
Thanks for your interest in our paper and your feedback.
Hello,
I've been following your work and I'm impressed by everything you've been able to create with the SPOT imagery! I'm an author of the Tolan et al paper and had a few questions, as we're considering including the open canopy dataset into our next model training.
Did you test only a 3-band fine-tuning of the Vit-L from our paper? We're a little unsure how adding an additional band to a pre-trained model will be helpful, as it'll cause a large shift in the loss landscape.
Where are your config files for image normalization? I wasn't ablle to find them in your repository. What means and standard deviations did you use for the Tolan et al Vit-L?
Did you test upsampling spot to ~0.5m to test with the ViT-L, and then downsample the results back to 1.5m for a comparison? The ViT is quite sensitive to the size of its input patch, and taking a model trained on 0.5 m imagery and applying it to 1.5 meter imagery has not been successful for us. For instance, with 1m NAIP imagery, we upsample to 0.5m and get much better results.
What patch size did you train for the Tolan et al ViT-L? a quick glance at your config files suggests 224? If so, I'd recommend retraining with 256. The model we released was trained with 256x256 patches, and it is okay to interpolate that patches by an integer multiplication (e.g. 512x512), but we tested fine-tuning 224 x 224 and got artifacts.
Was your learning rate 1e-3 for all the ViT experiments with plain Adam optimizer? The tolan et al ViT-L was trained with AdamW, and can't be fine-tuned with Adam and no weight decay. I'd also suggest keeping the learning rate of 5e-4.