How many samples per class are needed for training?

weizhiting commented 4 years ago

Hi ! First of all, thanks for this awesome library. I try pytorch_metric_learning in my project, but the accuracy is low. In my project, there are thousands of classes and most of the classes only have no more than 5 samples. I noticed that when i only keep the classes that have more than 50 samples, the accuray is satisfactory. So, in general, how many samples per class are needed for training? Is there any metric learning methods developed for few shot samples？ Thanks。

KevinMusgrave commented 4 years ago

In your dataloader, are you using a random sampler, or something like MPerClassSampler?

weizhiting commented 4 years ago

Yes, I use MperClassSampler. Below is my main code, after getting the embeddings, for each Xte_tran sample, i use cosine similarity to find the Xtr_tran that most similar to the Xte_tran.

class Setting:
    """Parameters for training"""
    def __init__(self, nclass):
        self.epoch = 300
        self.lr = 1e-5 * 4
        self.doPCA = True
        self.out_sz= 100
        self.nPCA = 1000
        self.m = 4
        self.batch_size = self.m * 64
        self.emb_szs=[500, 200]
        self.ps = 0.25
        self.use_bn = True
        self.actn = nn.ReLU()

class EmbeddingNet(nn.Module):
    def __init__(self, in_sz, out_sz, emb_szs, ps, use_bn=True, actn=nn.ReLU()):
        super(EmbeddingNet, self).__init__()
        self.in_sz = in_sz
        self.out_sz = out_sz
        self.n_embs = len(emb_szs) - 1
        ps = np.repeat(ps, self.n_embs)
        # input layer
        layers = [nn.Linear(self.in_sz, emb_szs[0]),
              actn]
        for i in range(self.n_embs):
            layers += self.bn_drop_lin(n_in=emb_szs[i], n_out=emb_szs[i+1], bn=use_bn, p=ps[i], actn=actn)
        layers.append(nn.Linear(emb_szs[-1], self.out_sz))
        self.fc = nn.Sequential(*layers)

    def bn_drop_lin(self, n_in:int, n_out:int, bn:bool=True, p:float=0., actn:nn.Module=None):
        layers = [nn.BatchNorm1d(n_in)] if bn else []
        if p != 0: layers.append(nn.Dropout(p))
        layers.append(nn.Linear(n_in, n_out))
        if actn is not None: layers.append(actn)
        return layers

    def forward(self, x):
        output = self.fc(x)
    return output

class BasicDataset(Dataset):
    def __init__(self, data, labels):
        self.data = torch.from_numpy(data).float()
        self.labels = labels

    def __getitem__(self, index):
        return self.data[index], self.labels[index]

    def __len__(self):
        return len(self.data)

def main():
    train_dataset = BasicDataset(Xtr, ytr)
    test_dataset = BasicDataset(Xte, yte)
    model = EmbeddingNet(in_sz=Xtr_pca.shape[1], out_sz=args.out_sz, emb_szs=args.emb_szs,
                 ps=args.ps, use_bn=args.use_bn, actn=args.actn)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)
    model_optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=0.0001)
    miner = miners.MultiSimilarityMiner(epsilon=0.1)
    loss = losses.TripletMarginLoss(margin=0.1, distance = CosineSimilarity())
    loss = losses.CrossBatchMemory(loss= loss, embedding_size= args.out_sz, miner=miner)
    sampler = samplers.MPerClassSampler(ytr, m=args.m, length_before_new_iter=len(train_dataset))
    models = {"trunk": model}
    optimizers = {"trunk_optimizer": model_optimizer}
    loss_funcs = {"metric_loss": loss}
    mining_funcs = {"tu    record_keeper, _, _ = logging_presets.get_record_keeper(logs, tensorboard)
    hooks = logging_presets.get_hook_container(record_keeper)
    dataset_dict = {"train": train_dataset, "val": test_dataset}
    tester = testers.GlobalEmbeddingSpaceTester(end_of_testing_hook = hooks.end_of_testing_hook, 
            dataloader_num_workers = 32, batch_size= args.batch_size, use_trunk_output=True,
            reference_set = 'compared_to_training_set', normalize_embeddings = False)
    end_of_epoch_hook = hooks.end_of_epoch_hook(tester, dataset_dict, model_folder, test_interval = 10,
                                        patience = 10)

    trainer = trainers.MetricLossOnly(models, optimizers, args.batch_size, loss_funcs, mining_funcs,
                            train_dataset, sampler=sampler, dataloader_num_workers = 32)

    trainer = trainers.MetricLossOnly(models, optimizers, args.batch_size, loss_funcs, mining_funcs,
                            train_dataset, sampler=sampler, dataloader_num_workers = 32,
                            end_of_iteration_hook = hooks.end_of_iteration_hook,
                            end_of_epoch_hook = end_of_epoch_hook)   

    trainer.train(num_epochs= args.epoch)ple_miner": miner}
    Xtr_trans, _ = tester.get_all_embeddings(train_dataset, model)
    Xte_trans, _ = tester.get_all_embeddings(test_dataset, model)

KevinMusgrave commented 4 years ago

Have you tried an experiment without CrossBatchMemory and MultiSimilarityMiner? As seen in this other issue, getting CrossBatchMemory to work well requires some tuning.

Also in my experience, ContrastiveLoss works better than TripletMarginLoss. Just make sure to change the default margins if you're going to use CosineSimilarity()

weizhiting commented 4 years ago

Yeah, Indeed, I has tried without CrossBatchMemory and MultiSimilarityMiner. Furthermore, other losses such as ConstastiveLoss have been tried. But the preformance only changes a little bit.

KevinMusgrave commented 4 years ago

In tester.get_all_embeddings, are you using the best performing model?

Also, since you're dealing with an unbalanced dataset, it will probably help to use a different accuracy calculator:

from pytorch_metric_learning.utils.accuracy_calculator import AccuracyCalculator
accuracy_calculator = AccuracyCalculator(avg_of_avgs=True)
tester = testers.GlobalEmbeddingSpaceTester(accuracy_calculator=accuracy_calculator)

This will compute the average of per-class accuracies, so a class with 5 samples is just as important as a class with 50 samples. The way accuracy is computed is important because the end_of_epoch_hook uses accuracy to determine when training should end.

weizhiting commented 4 years ago

As you suggested, i used the best performing model and AccuracyCalculator(avg_of_avgs=True), but the performance also only changes a little bit. Therefore, were there not enough samples per class for training in my porject? How to determine the actual reasons? Thanks for your patience.

KevinMusgrave commented 4 years ago

There isn't a clear rule about how many samples per class are necessary. Other factors affect the number of necessary training samples, like the similarity between the training and test sets, and the similarity between different classes. I suggest you look at some other aspects of training like:

Could your model architecture be improved?
Is it necessary to do PCA on the input to the model?
Try different train/test splits. For example, try training and testing on just the small sample classes, to see if the poor accuracy is due to confusion between the small and large sample classes.

weizhiting commented 4 years ago

I keep two samples per class in the test sets and all the other samples per class in the training sets. Some classes are indeed very similar in my project, may be this is the reason why the performance is low. (1) as i am a newer in this field, i do not know how to change the architecture of my model. Add more layers? Add more neurons per layer? (2) Every sample has thousands features, without reducing the dimension, the performance is extremely poor. (3) i has tried only training and testing on just the small sample classes, and the performance is very low. If training and testing on just the large sample classes, the performance is satisfactory. Maybe i need do more experiment on my datasets, and findout the reasons.

KevinMusgrave commented 4 years ago

Yeah, if some classes are really similar, then having only a few samples will make it extra difficult. Perhaps you could apply more data augmentations during training.

As a final suggestion, you could read up on few-shot learning. Here are a couple of repos that might help:

Good luck!

weizhiting commented 4 years ago

Thanks for your good advice. One more question, can I implement multi-label metric learning with this awesome library, such as https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd10mddm.pdf.

KevinMusgrave commented 4 years ago

That particular method hasn't been implemented. However, you can train and test on multi-label datasets. By multi-label, I mean that instead of an element's label being a single number, it is instead a list of numbers or a numpy array of numbers. So in other words, a batch of 32 elements with 2 labels each will have labels with shape (32, 2).

To train on multi-labels you need to write a custom trainer. Here's a simple one that applies metric_loss to each label level. You can use this trainer exactly like MetricLossOnly:

from pytorch_metric_learning.trainers import MetricLossOnly

class MultiLabelTrainer(MetricLossOnly):
    def calculate_loss(self, curr_batch):
        data, labels = curr_batch
        embeddings = self.compute_embeddings(data)
        for i in range(labels.size(1)):
            curr_labels = labels[:, i]
            indices_tuple = self.maybe_mine_embeddings(embeddings, curr_labels)
            self.losses["metric_loss"] += self.maybe_get_metric_loss(embeddings, curr_labels, indices_tuple)
        self.losses["metric_loss"] /= labels.size(1)

You also need to pass in label_hierarchy_level to the trainer upon initialization. There are 3 ways to use this argument:

A single integer i. This will select the ith column in the labels array.
A list of integers. This will select multiple columns in the labels array.
The string "all". This will use all columns of the labels array.

For example if you have 3 labels per element and you want to use all of them:

MultiLabelTrainer(label_hierarchy_level="all")

Next, to test multi label datasets, simply use the GlobalEmbeddingSpaceTester, but set label_hierarchy_level to the desired mode.

from pytorch_metric_learning.testers import GlobalEmbeddingSpaceTester
tester = GlobalEmbeddingSpaceTester(label_hierarchy_level="all")

Note that I haven't tested the above code, and this is one of those features that hasn't been tested much, so there may be bugs!

If you're looking for more sophisticated multi-label training methods and would like them to be implemented, feel free to open a separate issue.

KevinMusgrave / pytorch-metric-learning

How many samples per class are needed for training? #178