Question about the memory

griffith-wu commented 10 months ago

Hi, thank you very much for your excellent work and code. Your code structure is intuitive and effective and has inspired me a lot. However, I do not have that high-performance computer in my reproduction experiments, so I tried to reduce the batch size from 128 to 16 at the expense a liitle experimental results performance. However, during the evaluation phase of the VIGOR dataset, the program often gets killed due to lack of memory, I tried to buy an extra memory stick to increase my computer's memory to 32GB, and it still gets killed. So I would appreciate if you could share the size of the memory of your experimental equipment for reference.

In addition, I noticed the following settings in the training scripts :

neighbour_select: int = 64     # max selection size from pool
neighbour_range: int = 128     # pool size for selection

Based on your description in 3.4 of the paper, while reducing the batch size to 16, should I adjust neighbour_range to 16 and neighbour_select to 8?

Skyy93 commented 10 months ago

Hi thank you for the kind words,

yes the memory can be an issue due to the large similarity matrix. One possibility to reduce the memory footprint is a rewrite of the def calculate_scores in the evaluation. We used the code on our DGX, and have 1TB RAM thus this never was an issue on our side.

You could probably just use fp16 instead of fp32 to reduce the memory footprint (this results in slightly worse scores but not much), or alternatively create a memory mapped file to write the results directly to disk or reduce the step_size in the evaluation function.

Yes based on the description for your batch size you should reduce to neighbour_range=16 and neighbour_select=8

griffith-wu commented 10 months ago

Much appreciate for your advice! I notice that you mentioned the usage of np.memmap, I read the documents and try to simply modified the codes, results turned out that the RAM usage can be reduced from >32GB to 21GB. Although the computation speed is quite a bit lower but it is completely acceptable.

My modicfication:

def calculate_scores(query_features, reference_features, query_labels, reference_labels, step_size=1000, ranks=[1,5,10]):
    ...
    for i, idx in enumerate(reference_labels_np):
        ref2index[idx] = i
    # similarity = []
    similarity_np = np.memmap('_tmp.pkl', dtype="float32", mode="w+", shape=(int(Q), int(R)))
    for i in range(steps):
        start = step_size * i     
        end = start + step_size    
        sim_tmp = query_features[start:end] @ reference_features.T   
       #  similarity.append(sim_tmp.cpu())
        similarity_np[start:end] = sim_tmp.cpu().numpy()
    # similarity = torch.cat(similarity, dim=0)
    similarity = torch.tensor(similarity_np, dtype=torch.float32)
    del similarity_np
    topk.append(R//100)
    results = np.zeros([len(topk)])
    ...

Hopefully this will help others who are experiencing similar problems. Note that the data stream in my modification actually is torch.Tensor -> np.Ndarray -> torch.tensor, so maybe it is not the best practice, just for reference.

Thanks again for your detailed suggestions!

Skyy93 commented 10 months ago

Thank you for your implementation!

Skyy93 commented 10 months ago

@griffith-wu

If I have time to test, I can integrate the code into mine if that's ok with you?

griffith-wu commented 10 months ago

Sure, I think it's an extra way to save memory at the expense of running speed, so it might be better to make it an extra option in the settings, e.g.:

if cfg.save_memory:
    ...

Skyy93 / Sample4Geo

Question about the memory #3