Simple-Robotics / cosypose

Code for "CosyPose: Consistent multi-view multi-object 6D pose estimation", ECCV 2020.
MIT License
70 stars 15 forks source link

Memory optimized dists_add_symmetric #18

Closed KushnirDmytro closed 1 year ago

KushnirDmytro commented 1 year ago

I am proposing PR that fixes the old issues that mention 'CUDA out of memory error' upon running the evaluation script.

I figured out that this issue comes from a single function; It is cosypose.lib3d.distances.dist_add_symmetric

It allocates tensors of sizes NxNx3 and NxNx1, where N is the number of points.

Yet the same could be achieved by rewriting the code a little bit.

Alternative solutions are also possible and working(tested).

_Also, some distance functions from lib3d.symmetric_distances.py file could be optimized, as they compute similar distance functions._

This solution uses <0.25 of the original version's memory: An experiment was performed on run_cosy_pose_eval.py pipeline. Evaluated 30 objects from tless.bop version of the dataset. MemExperiments

The experiment with RTX-2080(8Gb) was not clean because GPU was also used for system GUI runtime. The scenario with TITAN-X(12Gb) was much cleaner - performed on a headless server. The old version of the code fails on both setups, while the new one works on both.

The low threshold on use cases could be explained by memory usage for context data and fragmentation. The error is triggered by the requirement to allocate one very large contiguous Tensor. This PR fixes the

nim65s commented 1 year ago

Thanks @KushnirDmytro for this work !

Maybe @ElliotMaitre you could test this to double check ?

ElliotMaitre commented 1 year ago

I tested it, it works for me. This change allows to run the evaluation on tless dataset, on a RTX-3060 (12Gb). However, on the ycbv dataset, I still have the memory issue with CUDA out_of_memory. All in all, the improvement is still very noticeable !

Thank you for your contribution

KushnirDmytro commented 1 year ago

@ElliotMaitre Thank you for both: review and appreciation)

After your comment, I was puzzled by a reported YCBV dataset problem. I downloaded it (with cosypose download script, the exact proposed version of the dataset), then successfully ran the evaluation (as proposed on the landing page of this repo) -- On RTX2080 with 8G, it works fine, memory consumption is modest.

Then checked the data:

It feels like you had CUDA out_of_memory issue for a different reason. I observed several times when eval (or other script) is terminated during the ongoing computations, often the process hangs in the background and occupies GPU memory. I have a hypothesis, that you launched eval on YCBC while having a zomby TLESS-eval process in the background. This is a reproducible scenario, I checked that.