fix device for ids in _compute_embedding in Evaluator

anosorae / IRRA

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)

MIT License

183 stars 25 forks source link

fix device for ids in _compute_embedding in Evaluator #11

Closed AwePhD closed 1 year ago

AwePhD commented 1 year ago

Hello,

In module utils.metrics, there is the Evaluator class and its private method _compute_embedding for computing the features and IDs for texts and images on the whole test dataset.

On line L60 and L70 we must add to(device) at the end of the concatenated Tensor of IDs.

If we do not send those ids to the GPU then we have a problem by computing tensors that are not on the same tensor. In method eval, we use the helper function rank to compute metrics. It computes on the similarity (GPU) and IDs (CPU). We get an error if we do not send the IDs to GPU. Plus, on the next line (L85) we see that the metrics are sent to the CPU. Then, those metrics were supposed to be on GPU. Thus IDs should have gone to GPU.

I tried to give some details even if I think this is a very tiny fix / error. I can do a PR if you want with the fix aforementioned. :)

Best, Mathias.

anosorae commented 1 year ago

I am sorry that our code is confusing you, the presence or absence of the .cpu() operation is irrelevant in (L85) code. This is because qids, gidsy are always on the CPU and the calculated CMC and mAP are all on the CPU.

Anyway, thank you for pointing this out and we have modified the potentially misleading code at L85.

AwePhD commented 1 year ago

First, the modification that I pointed in my issue made the rank function crashing. I do not have access to my computer so I cannot provide the error message right now. I will provide the error message as soon as possible. If you do not have the error message it's, maybe, because I use a recent torch version? In brief, if I do not send IDs on GPU then I cannot evaluate the epoch with itc loss. Also, I will test the updated code with my laptop and get you up to date.

Second, your repo is clean code. I read most of your 3k lines and I can tell your code has a good quality, in addition of good research findings. Thanks again for your quick answers and your sharing.

anosorae commented 1 year ago

I found the issue you describe above occurs in pytorch version 1.13.1, so I guess it is caused by the different pytorch version (I use pytorch1.9.0), maybe the up-to-date version of pytorch has disabled the indices operation on different devices of L17 in metrics.py.
To fix this, I suggest that you add a .cpu() function after the indices to transfer them to the cpu, as the minimum modification solution. Like this:

pred_labels = g_pids[indices.cpu()]

And I will add this adaptation code in next commit.