Open runfengxu opened 3 years ago
Yes, FSNS is organized in such a way that one sample is actually comprised of 4 samples.
The code snippet you refer to handles this case. If the flag uses_original_data
is set to True
the incoming image with a shape of (batch_size, 3, 150, 600)
(height 150 pixels and width 600 pixels) is reorganized to a batch with the following shape (4, 3, 150, 150)
. We basically convert one image to 4 images and handle them independently. Later, they are fused together again.
When I convert the image data from tfrecord format to jpg formet, I found that, each jpg file is actually 4 square images concatenated together. And the the FileBasedDataset does nothing regarding that. And I don't see the FSNSLocalizationNet do separate localization for these 4 images. How to understand this?
if self.uses_original_data:
handle each individual view as increase in batch size
does it consider 4 different images as an additional dimension for the localization?