andreaschandra / product-matching

Shopee NSDC Product Matching 2020
0 stars 0 forks source link

Model & Evaluation #2

Open andreaschandra opened 3 years ago

andreaschandra commented 3 years ago

Image similarity

Multi modal

andreaschandra commented 3 years ago

num_vocab = 8479 emb_size = 768 hid_size = 512 num_layers = 1

TextEncoder = nn.Sequential( 
            nn.Embedding(num_vocab, emb_size), 
            nn.LSTM(emb_size, hid_size, num_layers=num_layers, batch_first=True)
        )

out_dim = [batch_size, seq_length, hid_size] TextEncoder output torch.Size([2, 14, 512])

the config gets Trainable parameters 9,135,104

@alamhanz

alamhanz commented 3 years ago

Are we still use Keras?

The output of the MobileNetV2 in Keras will be [None, 7, 7, 1280] where [batch_size, img_legth, img_width, channel] My suggestions :

  1. Since you have [None, 14,512] as output,. how about reshaping it into [None, 1, 14, 512]
  2. and I will do the Extra Cov2D for image and the output can be [None, 4, 4, 256], then I reshape it so the output can be [None, 1, 16, 256]
  3. however, since 14 != 16,. can you add the seq_length to be 16? --> so the total dim for multi_modal_network can be [None, 1, 16, 512+256] --> [None, 1, 16, 768]

what do you think?

@andreaschandra

alamhanz commented 3 years ago

anw.. Trainable params: 7,467,008

alamhanz commented 3 years ago

I have 2 options actually.. option 1: image

and option 2: image

or maybe instead of [None, 1, 16, 256] you want to keep it into [None, 1, 16, 512] ??

andreaschandra commented 3 years ago

Ahh @alamhanz I will rewrite the image model to pytorch

  1. yes it could be. But, [None, 1, 14, 512] shape would be equal in image like [batch, height, width, channel] I first thought that the output image model would be [batch, channel, height, width]. So that the text has 1 channel, and height would be equal to sequence length, and hidden size would be width image
  2. It would be a sparse matrix if we set the sequence length to the longest title.
andreaschandra commented 3 years ago

@alamhanz oh no! issue 1 didn't work or if you want to make the channel become 1

if you permute the channel to axis 1 I can repeat to [None, n_channel, sequence_length, hidden_size]

andreaschandra commented 3 years ago

@alamhanz let's move to 256 for making a smaller model, once we get underfitting, we set the parameters higher

alamhanz commented 3 years ago
  1. the text has 1 channel, and height would be equal to sequence length, and hidden size would be width image, actually,. in my opinion,. you can do something like this.. with sequence length x hidden size must equals to Square numbers like 16 or 25.. the other problem also.. if we create 1 channel with your set up like that, Will conv2D in multi_modal_network make sense for Text?? or do you have plan to use the Conv1D instead?? do you get what I mean right?

The reason I suggest to set up the length of sequence as channel and the (1, hidden layers) as (lenght, width) to make the conv2D still make sense after the concatenate with the reshape of images (in my opinions)..

  1. yeah you can set up the length sequence as 9 if you want.. the reason I suggest to make it become 16 to make it easier for the image to be reshape..

what you think?

andreaschandra commented 3 years ago

I get it... I need to calculate max sequence length first then

@alamhanz let's set fix length to 25

andreaschandra commented 3 years ago

num_vocab=8476 setting up TextEncoder(num_vocab, 512, 256, 1) gives Trainable parameters 5,128,192

andreaschandra commented 3 years ago

TextEncoder output torch.Size([2, 1, 25, 256])

andreaschandra commented 3 years ago

ImageEncoder would be like this

mobilenet = models.mobilenet_v2()

backbone = mobilenet.features

model = nn.Sequential(
    backbone,
    nn.Conv2d(in_channels=1280, out_channels=256, kernel_size=(3,3))
)

with input size img = torch.rand(1, 3, 224, 224) with output size torch.Size([1, 256, 5, 5])

Trainable parameters: 5,173,248
andreaschandra commented 3 years ago

Scenario Concatenation:

  1. Text: [batch, 1, seq_length, hidden_size] Image: [batch, 1, h x w, channel]
  2. Text: [batch, seq_length, hidden_size, 1] Image: [batch, channel, h x w, 1]
andreaschandra commented 3 years ago
loss_contrastive = torch.mean((batch_label_c) * torch.pow(euclidean_distance, 2) +
                                      (1-batch_label_c )* torch.pow(torch.clamp(0.5 - euclidean_distance, min=0), 2))
andreaschandra commented 3 years ago

https://github.com/hadikazemi/Machine-Learning/blob/master/PyTorch/tutorial/simese_cnn.py#L137

euclidean_distance = F.pairwise_distance(features_1, features_2)
        loss_contrastive = torch.mean((1 - batch_label_c) * torch.pow(euclidean_distance, 2) +
                                      batch_label_c * torch.pow(torch.clamp(2 - euclidean_distance, min=0.0), 2))
andreaschandra commented 3 years ago

Total params: Trainable params: 11,582,542

andreaschandra commented 3 years ago

@alamhanz check C41_training.ipynb

andreaschandra commented 3 years ago

architecture changed dramatically, works on batch size 64. first batch skip, overfitting

andreaschandra commented 3 years ago
epoch: 1 | time: 5.2s
    train loss: 0.80 | train accuracy: 59.38
    val loss: 1.29 | val accuracy: 53.12
epoch: 2 | time: 2.3s
    train loss: 0.77 | train accuracy: 71.88
    val loss: 16.75 | val accuracy: 65.62
epoch: 3 | time: 2.3s
    train loss: 1.87 | train accuracy: 62.50
    val loss: 2.35 | val accuracy: 59.38
epoch: 4 | time: 2.3s
    train loss: 3.87 | train accuracy: 68.75
    val loss: 1.01 | val accuracy: 59.38
epoch: 5 | time: 2.4s
    train loss: 0.45 | train accuracy: 71.88
    val loss: 1.13 | val accuracy: 62.50
epoch: 6 | time: 2.3s
    train loss: 0.56 | train accuracy: 68.75
    val loss: 1.56 | val accuracy: 65.62
epoch: 7 | time: 2.3s
    train loss: 0.25 | train accuracy: 81.25
    val loss: 2.08 | val accuracy: 62.50
epoch: 8 | time: 2.3s
    train loss: 0.14 | train accuracy: 84.38
    val loss: 1.51 | val accuracy: 59.38
epoch: 9 | time: 2.4s
    train loss: 0.11 | train accuracy: 84.38
    val loss: 1.66 | val accuracy: 59.38
epoch: 10 | time: 2.3s
    train loss: 0.09 | train accuracy: 90.62
    val loss: 2.03 | val accuracy: 59.38
andreaschandra commented 3 years ago
epoch: 40 | time: 395.8s
        train loss: 0.28 | train accuracy: 86.39
        val loss: 1.09 | val accuracy: 69.79
epoch: 41 | time: 393.4s
        train loss: 0.29 | train accuracy: 85.51
        val loss: 1.11 | val accuracy: 70.10
epoch: 42 | time: 396.4s
        train loss: 0.29 | train accuracy: 86.80
        val loss: 1.09 | val accuracy: 70.79
epoch: 43 | time: 396.5s
        train loss: 0.29 | train accuracy: 84.71
        val loss: 1.16 | val accuracy: 69.17
epoch: 44 | time: 397.7s
        train loss: 0.27 | train accuracy: 86.17
        val loss: 1.06 | val accuracy: 69.98
epoch: 45 | time: 392.0s
        train loss: 0.27 | train accuracy: 86.84
        val loss: 1.09 | val accuracy: 70.60
epoch: 46 | time: 393.7s
        train loss: 0.26 | train accuracy: 87.49
        val loss: 1.12 | val accuracy: 70.60
epoch: 47 | time: 393.4s
        train loss: 0.26 | train accuracy: 87.40
        val loss: 1.16 | val accuracy: 69.98
epoch: 48 | time: 389.3s
        train loss: 0.27 | train accuracy: 88.18
        val loss: 1.07 | val accuracy: 71.33
epoch: 49 | time: 398.5s
        train loss: 0.26 | train accuracy: 86.50
        val loss: 1.07 | val accuracy: 69.56
epoch: 50 | time: 395.3s
        train loss: 0.25 | train accuracy: 88.29
        val loss: 1.08 | val accuracy: 70.68