Closed griffith-wu closed 10 months ago
Hi thank you for the kind words,
yes the memory can be an issue due to the large similarity matrix. One possibility to reduce the memory footprint is a rewrite of the def calculate_scores
in the evaluation. We used the code on our DGX, and have 1TB RAM thus this never was an issue on our side.
You could probably just use fp16 instead of fp32 to reduce the memory footprint (this results in slightly worse scores but not much), or alternatively create a memory mapped file to write the results directly to disk or reduce the step_size in the evaluation function.
Yes based on the description for your batch size you should reduce to neighbour_range=16
and neighbour_select=8
Much appreciate for your advice! I notice that you mentioned the usage of np.memmap
, I read the documents and try to simply modified the codes, results turned out that the RAM usage can be reduced from >32GB to 21GB. Although the computation speed is quite a bit lower but it is completely acceptable.
My modicfication:
def calculate_scores(query_features, reference_features, query_labels, reference_labels, step_size=1000, ranks=[1,5,10]):
...
for i, idx in enumerate(reference_labels_np):
ref2index[idx] = i
# similarity = []
similarity_np = np.memmap('_tmp.pkl', dtype="float32", mode="w+", shape=(int(Q), int(R)))
for i in range(steps):
start = step_size * i
end = start + step_size
sim_tmp = query_features[start:end] @ reference_features.T
# similarity.append(sim_tmp.cpu())
similarity_np[start:end] = sim_tmp.cpu().numpy()
# similarity = torch.cat(similarity, dim=0)
similarity = torch.tensor(similarity_np, dtype=torch.float32)
del similarity_np
topk.append(R//100)
results = np.zeros([len(topk)])
...
Hopefully this will help others who are experiencing similar problems. Note that the data stream in my modification actually is torch.Tensor -> np.Ndarray -> torch.tensor
, so maybe it is not the best practice, just for reference.
Thanks again for your detailed suggestions!
Thank you for your implementation!
@griffith-wu
If I have time to test, I can integrate the code into mine if that's ok with you?
Sure, I think it's an extra way to save memory at the expense of running speed, so it might be better to make it an extra option in the settings, e.g.:
if cfg.save_memory:
...
Hi, thank you very much for your excellent work and code. Your code structure is intuitive and effective and has inspired me a lot. However, I do not have that high-performance computer in my reproduction experiments, so I tried to reduce the batch size from 128 to 16 at the expense a liitle experimental results performance. However, during the evaluation phase of the VIGOR dataset, the program often gets killed due to lack of memory, I tried to buy an extra memory stick to increase my computer's memory to 32GB, and it still gets killed. So I would appreciate if you could share the size of the memory of your experimental equipment for reference.
In addition, I noticed the following settings in the training scripts :
Based on your description in 3.4 of the paper, while reducing the batch size to 16, should I adjust
neighbour_range
to 16 andneighbour_select
to 8?