Model for GrayScale images

abhiray92 commented 4 years ago

Hi @HHTseng I am trying to implement the CRNN model for Gray Scale images, can you please guide for the changes that is needed to made in the Conv layers for grayscale images?

HHTseng commented 4 years ago

Hi Abhishek, A fastest way is probably to inflate the grayscale images (1 channel) into 3 channel RBG images, simply by copying the grayscale channel 3 times, as shown here: https://discuss.pytorch.org/t/grayscale-to-rgb-transform/18315 (I assume that you already have certain familiarity with computer images on python), where you need to inject the snippet like: x = torch.randn(28, 28) x.unsqueeze_(0) x = x.repeat(3, 1, 1) x.shape > torch.Size([3, 28, 28])

into the function in the dataloader: https://github.com/HHTseng/video-classification/blob/82d85e8c2a5dff3eea66e4deff1d927a7144fc00/CRNN/functions.py#L83

This is the minimal change to make such that the remaining code need not be altered, but not efficient of course. Otherwise, you'll need to re-adjust your Convolution layers for gray images. Hope this helps.

Best regards, HTseng

On Tue, Nov 17, 2020 at 12:35 AM Abhishek Ray notifications@github.com wrote:

Hi @HHTseng https://github.com/HHTseng I am trying to implement the CRNN model for Gray Scale images, can you please guide for the changes that is needed to made in the Conv layers for grayscale images?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HHTseng/video-classification/issues/37, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6SOXANRNLK7NRULY5IKIDSQFID5ANCNFSM4TXMNDTQ .

abhiray92 commented 4 years ago

Hi @HHTseng, for CRNN model, I have changed the transform variable to single channel -

transform = transforms.Compose([transforms.Resize([img_x, img_y]),
                                transforms.ToTensor(),
                                transforms.Normalize(mean=[0.485], std=[0.229])])

and the first convolution layer to 1 -

self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=self.ch1, kernel_size=self.k1, stride=self.s1, padding=self.pd1),
            nn.BatchNorm2d(self.ch1, momentum=0.01),
            nn.ReLU(inplace=True),                      
            # nn.MaxPool2d(kernel_size=2),
        )

abhiray92 commented 4 years ago

The issue has been closed.

HHTseng / video-classification

Model for GrayScale images #37