Nanne / pytorch-NetVlad

Pytorch implementation of NetVlad including training on Pittsburgh.
433 stars 109 forks source link

terminology clarification #50

Closed mrgransky closed 3 years ago

mrgransky commented 3 years ago

I wonder if you could kindly clarify differences between the following terms:

batchSize vs cacheBatchSize vs cacheRefreshRate

self.nNegSample vs self.nNeg

self.nontrivial_positives vs self.potential_positives

self.potential_negatives, self.negCache and self.negCache

For Pittsburgh 30k, for instance, I can print these info for two classes WholeDatasetFromStruct and QueryDatasetFromStruct:

----------------------------------------------------------------------------------------------------
                                 Loading pittsburgh in train mode
>> Defining whole_train_set...
>> whole_train_set [17416]: 
WholeDatasetFromStruct
        dataset: pitts30k mode: train
        IMGs (db: 10000 qu: 7416) onlyDB: False => |IMGs|: 17416
        positives: None
        Transforms (if any): Compose(
                          ToTensor()
                          Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
                      )

>> Defining whole_training_data_loader given whole_train_set using torch.utils.data.DataLoader...
>> ok!
>> Defining train_set for queries with 0.1 margin...
>> train_set [7320]: 
QueryDatasetFromStruct
        Dataset: pitts30k mode: train margin: 0.1
        nontrivial (+) th: 10.0 m       potential (+) th: 25 m
        Negs: 10 Neg samples: 1000 potential Negs (> 25 m): 7416
        nontrivial pos: 7416 potential pos: 7416
        IMGs (db: 10000 qu: 7416)
        All queries without nontrivial positives: 7320
        negative Cache: 7416
        Transforms (if any): Compose(
                          ToTensor()
                          Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
                      )

>> Defining whole_test_set...
>> whole_test_set [17608]: 
WholeDatasetFromStruct
        dataset: pitts30k mode: val
        IMGs (db: 10000 qu: 7608) onlyDB: False => |IMGs|: 17608
        positives: None
        Transforms (if any): Compose(
                          ToTensor()
                          Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
                      )

>> Evaluating on val set, query count: 7608

                                              Done
----------------------------------------------------------------------------------------------------

I have already read, #33, #26, #4, #9! I am trying to adjust NetVLAD to another dataset with GPS info and I am confused how to modify code accordingly?

Nanne commented 3 years ago

These terms are copied from the original netvlad codebase and are mostly described in the paper also (pay attention to section 4 and appendix A of the paper), plus most of these there is actually some documentation in the code:

batchSize, cacheBatchSize, cacheRefreshRate: https://github.com/Nanne/pytorch-NetVlad/blob/master/main.py#L29-L33

nNegSample & nNeg: https://github.com/Nanne/pytorch-NetVlad/blob/master/pittsburgh.py#L179-L180

nontrivial_positives are those images within self.dbStruct.nonTrivPosDistSqThr**0.5 (10 meter) of query position, and are used for the Multiple instance learning selection of the positive image in the triplet.

Potential_positives are the images within self.dbStruct.posDistThr (25 meter?) of the query, and are used for negative selection (i.e., these for sure should not be a negative), and during evalution these are labeled as correct.

Potential_negatives: https://github.com/Nanne/pytorch-NetVlad/blob/master/pittsburgh.py#L198 i.e., those images which are more than 25 meters away from the query. And these might be used for negative mining.

NegCache is the cache of the negatives, i.e., the 10 images selected as negatives last time this query was seen (last epoch). Which implements this step from the paper (p.11):

The mining is done by keeping the 10 hardest negatives from a pool of 1000 randomly sampled negatives and 10 hardest negatives from the previous epoch. We find that remembering previous hard negatives adds stability to the training process.

If you study this codebase and the NetVlad paper then you should be able to figure out how it works, from there you can see how to adapt it to your own dataset.