Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

GNU General Public License v3.0

573 stars 147 forks source link

When I convert the image data from tfrecord format to jpg formet, I found that, each jpg file is actually 4 square images concatenated together. And the the FileBasedDataset does nothing regarding that. And I don't see the FSNSLocalizationNet do separate localization for these 4 images. How to understand this?

if self.uses_original_data:

handle each individual view as increase in batch size

        batch_size, num_channels, height, width = images.shape
        images = F.reshape(images, (batch_size, num_channels, height, 4, -1))
        images = F.transpose(images, (0, 3, 1, 2, 4))
        images = F.reshape(images, (batch_size * 4, num_channels, height, width // 4))

does it consider 4 different images as an additional dimension for the localization?

Bartzi / see

dataset for fsns experiment #98

handle each individual view as increase in batch size