Closed Hugo-cell111 closed 1 year ago
Thanks for your attention. Indeed, computing correlation maps will bring more memory costs. As described in our paper, we use features extracted from the encoder, which has an 8x downsample. You can refer to our code for more details. As for the 321x321 crop size, the feature size is 41x41. And under this setting, 2 x NVIDIA 3090 GPUs (24 GB memory) are enough for training. For 513x513 and 800x800 crop sizes, we use 4x A40 (48GB memory). Also, if you have limitations in computation resources, try a 16x downsample encoder.
Thanks!
Hi! In your paper, the size of correlation map is HW * HW, but in semantic segmentation task, HW is around 1e5, which means the map will bring huge memory cost. I wonder how you solve such a problem? Thanks!