KevinMusgrave / pytorch-metric-learning

The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
https://kevinmusgrave.github.io/pytorch-metric-learning/
MIT License
6k stars 658 forks source link

Metric learning loss for Multi label learning #354

Closed priyarana closed 3 years ago

priyarana commented 3 years ago

Hi

In reference to the following discussion https://github.com/KevinMusgrave/pytorch-metric-learning/issues/178 , may I know if there are any more methods to implement metric learning in multi label settings.

'from pytorch_metric_learning.trainers import MetricLossOnly' --> is not working for me. My data set is multilabel and imbalanced.

Thanks

KevinMusgrave commented 3 years ago

No there aren't any built-in methods for that yet, so you'll have to implement something like the MultiLabelTrainer from #178: https://github.com/KevinMusgrave/pytorch-metric-learning/issues/178#issuecomment-675611566

priyarana commented 3 years ago

Thank you so much for the prompt reply. I implemented #178 , but for some reason it didn't help. I have my base model with SGD as an optimiser and BCEWithLogitsLoss() as loss function, that gives me 0.50 as a macro F1 score on test images. I replaced the loss function : criterion = losses.TripletMarginLoss(margin = 0.2, distance = distance, reducer = reducer) and also used , mining_func = miners.TripletMarginMiner(margin = 0.2, distance = distance, type_of_triplets = "hard") with FaceAdam as optimiser in the same code F1 score on test set images reduced to 0.06. Any suggestions !? . Thank you

KevinMusgrave commented 3 years ago

I replaced the loss function : criterion = losses.TripletMarginLoss(margin = 0.2, distance = distance, reducer = reducer) and also used , mining_func = miners.TripletMarginMiner(margin = 0.2, distance = distance, type_of_triplets = "hard")

I would use a simpler baseline, like criterion = losses.ContrastiveLoss() and no mining function.

Re: accuracy, how is your model outputting predictions when using the metric learning loss?

priyarana commented 3 years ago

Thanks, I'll try ContrastiveLoss().

During training using TripletMarginLoss, the loss in first epoch was 0.12, which got reduced to 0.09 at 24th epoch , while accuracy remained same (acc = 0.50) from first to last epoch. There is definitely some mistake somewhere, but I have no clue.

KevinMusgrave commented 3 years ago

You can't apply the metric loss and then use argmax to get predictions. You have to do a k-nearest-neighbors search in the embedding space. If you need to predict multiple attributes per sample, then for each sample you'll need an embedding vector for each attribute. See this paper for an example.

priyarana commented 3 years ago

Thank you. I'll have a look at it. I am glad, at last i got to know why isnt it working . Thank you

priyarana commented 3 years ago

Hi

I used ContrastiveLoss function : criterion = losses.ContrastiveLoss(pos_margin=1, neg_margin=0, distance=CosineSimilarity())

with no mining function: loss = 0.0 for ii in range(target.size(1)): current_labels = target[:,ii] loss += criterion(output, current_labels) loss /= target.size(1)

losses.update(loss.item(),images.size(0))

Macro F1 score is turning out to be very low at 1st epoch ,i.e, 0.08 and loss = 0.496 for test images ( I implemented nearest neighbor approach to generate multi labels). It was 0.14 at first epoch for the regular supervised learning. May I know if you think anything is wrong here......

Also, just want to confirm regarding the #178 and #231 trick for the multi label implementation. For Triplet and contrastive loss functions, is the current label going to be in the binary form or it would be the index of the positive label?

Thank you

KevinMusgrave commented 3 years ago

Macro F1 score is turning out to be very low at 1st epoch ,i.e, 0.08 and loss = 0.496 for test images ( I implemented nearest neighbor approach to generate multi labels). It was 0.14 at first epoch for the regular supervised learning. May I know if you think anything is wrong here......

How are you doing the nearest neighbor search for multiple labels? Are you using a separate embedding space for each attribute (column of target)?

It's also worth asking if metric learning is the right approach. Is there a specific reason you want to use metric learning instead of regular supervised learning?

Also, just want to confirm regarding the #178 and #231 trick for the multi label implementation. For Triplet and contrastive loss functions, is the current label going to be in the binary form or it would be the index of the positive label?

Labels should be integers like 0, 5, 13 etc., not one-hot encoded vectors.

priyarana commented 3 years ago

Many thanks for the reply.

My data is imbalanced and multi label, with low interclass difference. With regular supervised learning (BCEWithLogitsLoss+ SGD+ augmentation + weightedrandomsampler), i am able to achieve 0.51 as macro F1 score. If I understand correctly, metric learning helps in achieving high interclass and low intraclass difference, so i am giving it a shot.

I have spotted one of my mistake from your message, I have been using one hot encoding, that's the reason I think for my loss to be constant during training.

I am not using separate embedding space for each attribute. After every 4 epochs, embeddings for training images and validation images are obtained from the trained model, knn is trained on training images and 10 nearest neighbors are obtained for each test image. mean of 10 nearest embedding is taken, sigmoid is applied and using threshold final labels are obtained.

I am new to deep learning, not sure if this approach is valid. I am happy to know your comments. Thank you

KevinMusgrave commented 3 years ago

I am not using separate embedding space for each attribute. After every 4 epochs, embeddings for training images and validation images are obtained from the trained model, knn is trained on training images and 10 nearest neighbors are obtained for each test image. mean of 10 nearest embedding is taken, sigmoid is applied and using threshold final labels are obtained.

You can't obtain labels directly from the embeddings like that, because the embedding dimensions aren't trained to correspond to any particular class. The only thing you can do is infer the class based on nearest neighbors. For example, you could get the classes of the 10 nearest neighbors, and use the most common class as the final label.

priyarana commented 3 years ago

Thank you for your reply. I get it, but actually my embedding dimension is same as number of classes.

I am using pretrained Resnet34 model:

**def get_resnet34(num_classes = 8, **_):**
            model_name = 'resnet34'
            model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
            conv1 = model.conv1
            model.conv1 = nn.Conv2d(in_channels=4,
                                    out_channels=conv1.out_channels,
                                    kernel_size=conv1.kernel_size,
                                    stride=conv1.stride,
                                    padding=conv1.padding,
                                    bias=conv1.bias)
            # copy pretrained weights
            model.conv1.weight.data[:,:3,:,:] = conv1.weight.data
            model.conv1.weight.data[:,3:,:,:] = conv1.weight.data[:,:1,:,:]
            model.avgpool = nn.AdaptiveAvgPool2d(1)
            in_features = model.last_linear.in_features
            model.last_linear = nn.Linear(in_features, 8)
            return model

and I compute embeddings as :

> for i, (images,target) in enumerate(train_loader):
>         images = images.cuda(non_blocking=True)
>         embeddings = model(images) 

So my embedding is of dimension [N, num_classes].

May i know if this sounds correct !? Thanks

priyarana commented 3 years ago

Labels should be integers like 0, 5, 13 etc., not one-hot encoded vectors.

Sorry for another question. I am trying to understand the implementation in #178

1. > class MultiLabelTrainer(MetricLossOnly):
2. >     def calculate_loss(self, curr_batch):
3. >         data, labels = curr_batch
4. >         embeddings = self.compute_embeddings(data)
5. >         for i in range(labels.size(1)):
6. >             curr_labels = labels[:, i]
7. >             indices_tuple = self.maybe_mine_embeddings(embeddings, curr_labels)
8. >             self.losses["metric_loss"] += self.maybe_get_metric_loss(embeddings, curr_labels, indices_tuple)
9. >         self.losses["metric_loss"] /= labels.size(1)

if my image labels are as follows: Image1. [7, 4, 2] Image2. [0,3] Image3. [0]

total classes in the dataset are 8. Can you please help me understand, that for these 3 images in a batch, what would be the curr_labels in each iteration of 5. loop , if I pass MultiLabelTrainer(label_hierarchy_level="all").

Thank you.

KevinMusgrave commented 3 years ago

Thank you for your reply. I get it, but actually my embedding dimension is same as number of classes.

So my embedding is of dimension [N, num_classes].

That doesn't matter. The contrastive loss doesn't care if your embedding has dimensionality 3, 8, or 128 etc. All the contrastive loss does is group together embeddings that belong to the same class. You could use embeddings of dimensionality 3 or 256, even though you have 8 classes. In contrast, classification losses like CrossEntropyLoss do assign meaning to each output dimension.

if my image labels are as follows: Image1. [7, 4, 2] Image2. [0,3] Image3. [0]

total classes in the dataset are 8. Can you please help me understand, that for these 3 images in a batch, what would be the curr_labels in each iteration of 5. loop , if I pass MultiLabelTrainer(label_hierarchy_level="all").

That code assumes your labels are 2-dimensional, where each column is a different "attribute". In your case, it looks like all labels are coming from the same label set, and the number of labels per sample can vary. So I don't think it makes sense to use MultiLabelTrainer.

I'm not sure how to apply metric learning in this case, and I highly doubt that it would outperform a classification based approach. However, if you really want to use metric learning, I found this repo that might help: https://github.com/abarthakur/multilabel-deep-metric

For imbalanced data, you could take a look at https://github.com/richardaecn/class-balanced-loss.

priyarana commented 3 years ago

Many Thanks for all the inputs. That's a big help. I'll look into those.

That doesn't matter. The contrastive loss doesn't care if your embedding has dimensionality 3, 8, or 128 etc. All the contrastive loss does is group together embeddings that belong to the same class. You could use embeddings of dimensionality 3 or 256, even though you have 8 classes. In contrast, classification losses like CrossEntropyLoss do assign meaning to each output dimension.

Thank you very much. One more question please. Am i correct in understanding that embedding is the features of an image , that can be obtained from the pretrained model such as ResNet in my case or the class EmbeddingNet(nn.Module) as it is in your library?

priyarana commented 3 years ago

Thank you for your reply. I get it, but actually my embedding dimension is same as number of classes. So my embedding is of dimension [N, num_classes].

That doesn't matter. The contrastive loss doesn't care if your embedding has dimensionality 3, 8, or 128 etc. All the contrastive loss does is group together embeddings that belong to the same class. You could use embeddings of dimensionality 3 or 256, even though you have 8 classes. In contrast, classification losses like CrossEntropyLoss do assign meaning to each output dimension.

if my image labels are as follows: Image1. [7, 4, 2] Image2. [0,3] Image3. [0] total classes in the dataset are 8. Can you please help me understand, that for these 3 images in a batch, what would be the curr_labels in each iteration of 5. loop , if I pass MultiLabelTrainer(label_hierarchy_level="all").

That code assumes your labels are 2-dimensional, where each column is a different "attribute". In your case, it looks like all labels are coming from the same label set, and the number of labels per sample can vary. So I don't think it makes sense to use MultiLabelTrainer.

I'm not sure how to apply metric learning in this case, and I highly doubt that it would outperform a classification based approach. However, if you really want to use metric learning, I found this repo that might help: https://github.com/abarthakur/multilabel-deep-metric

For imbalanced data, you could take a look at https://github.com/richardaecn/class-balanced-loss.

Many thanks for the inputs, thats a great help!

KevinMusgrave commented 3 years ago

Thank you very much. One more question please. Am i correct in understanding that embedding is the features of an image , that can be obtained from the pretrained model such as ResNet in my case or the class EmbeddingNet(nn.Module) as it is in your library?

Yes, embeddings are just vector representations of the input. The terms "embeddings", "features", and "representations" are interchangeable, though "embeddings" usually refer to vectors with shape (batch_size, embedding_size), whereas "features" and "representations" might refer to multi-dimensional tensors with shape (batch_size, C, X, Y ...). Here's a related discussion on reddit: https://old.reddit.com/r/MachineLearning/comments/ofivs2/d_difference_between_representation_vs_latent_vs/

priyarana commented 3 years ago

Many thanks!!