I get a wrong clsmap for the herdnet

Alexandre-Delplanque / HerdNet

Code for paper "From Crowd to Herd Counting: How to Precisely Detect and Count African Mammals using Aerial Imagery and Deep Learning?"

MIT License

17 stars 9 forks source link

I get a wrong clsmap for the herdnet #5

Closed xiaotongtongxue closed 4 months ago

xiaotongtongxue commented 4 months ago

It is a very excellent job! But I get an issue, which is when I input a Tensor (4, 3, 256, 256) into the herdnet, the hearmap is right (4, 1, 128, 128), but the clsmap is wrong (4, 2, 8, 8), the correct result should be (4, 1, 16, 16). What do you think I could do to address it ?

Alexandre-Delplanque commented 4 months ago

Hi @xiaotongtongxue! Thanks! The shape of the clsmap seems correct. You got (4,2,8,8) because your image size was 256x256px, and not 512x512px as in the paper. Given an input of shape [B,C,H,W], the localization head should produce a heatmap of [B,1,H/2,W/2] and a clsmap of [B,S,H/32,W/32], where S is the number of species. Hope this helps!

xiaotongtongxue commented 4 months ago

Yes, as you say, my image size is 256x256px (4 batches, 2 classes: background and xx), and the shape of the clsmap is (4, 2, 8, 8). However, the shape of the target[1] is (4, 16, 16), which doesn't match with the clsmap (the wrong information is RuntimeError: input and target batch or spatial sizes don't match: target [4, 16, 16], input [4, 2, 8, 8]). The CSV file for training is generated in a standard format (the following pic), so why does this happen? Thank you once again for your help. Your support means a lot to me. csv

xiaotongtongxue commented 4 months ago

@Alexandre-Delplanque Well! I have successfully run your code for transforming the int(patch_size//16) into the int(patch_size//8) in the PointsToMask function. Although my image has a size of 256 × 256px, I still do not know why I should change 16 to 8? Any help you can offer would be invaluable. Thanks again!

Alexandre-Delplanque commented 4 months ago

Hi @xiaotongtongxue,

Thanks! The classification head always produces a clsmap 32 times smaller in size than your input. Hence, 512/16 = 256/8 = 32. This is why you needed to change down_ratio = int(patch_size//16) to down_ratio = int(patch_size//8).

It would be better to write down_ratio=32 when you instantiate the PointsToMask class: PointsToMask(radius=2, num_classes=2, squeeze=True, down_ratio=32)

I hope this clarifies your concern?

xiaotongtongxue commented 4 months ago

@Alexandre-Delplanque , Thank you, and I get it from your detailed response. This is truly a remarkable job!

Alexandre-Delplanque commented 4 months ago

My pleasure! I close the issue.