facebookresearch / NSVF

Open source code for the paper of Neural Sparse Voxel Fields.
MIT License
797 stars 94 forks source link

Reduce Memory Use of GPUs in one line code. #34

Open yumi-cn opened 3 years ago

yumi-cn commented 3 years ago

I have try to run this project codes in RTX2080Ti (11GB) x 4,

the original args like "--view-per-batch 4 --pixel-per-view 2048" will cause the OOM Error In Cuda devices in just 2 iters,

so I try to reduce the batch size to "--view-per-batch 4 --pixel-per-view 128",and it works well in the first 5000 iters,

and the args "--view-per-batch 2 --pixel-per-view 128", works well in the first 25000 iters,

They will finally cause the OOM Error in the voxels split step(just a guess),So I try to check the codes about the mm control part, and I did not found any codes about "Release the unused cache of Pytorch",like some codes:

torch.cuda.empty_cache()

so I try to add this code to the "fairnr/models/nsvf.py/NSVFModel/clean_caches":

    def clean_caches(self, reset=False):
        self.encoder.clean_runtime_caches()
        if reset:
            self.encoder.reset_runtime_caches()
        torch.cuda.empty_cache() # cache release after Model do all things

And this really help me to do more split steps (but still can not do more split steps like after 75000 iters)。

Before Add this line code:

Mem use of Cuda device: 4000MB ->(voxel split) 8000MB -> (voxel split) OOM Error

After Add this line code:

Mem use of Cuda device: 4000MB ->(voxel split) 6800MB -> (voxel split) 9900MB ->  (voxel split) OOM Error

And I don't find any bad affect on the results, yet.

I also try other ways to solve the problem of OOM, like add args "--fp16" to turn on the fp16 mode in apex module(which says can reduce the mem use due to use float16), But this just cause error which I post the Issue #33.

If you guys have interesets about how to run these codes in the other cuda device(especially those not have so much gpu mm as V100 32GB),This line code and the bug report maybe useful for you guys.

Thanks for replying.

ghasemikasra39 commented 3 years ago

On which dataset and on which object of that dataset are you training?

yumi-cn commented 3 years ago

On which dataset and on which object of that dataset are you training?

I have test on the Synthetic-NSVF dataset, such as the Bike and the Palace.

yyeboah commented 3 years ago

@yumi-cn Thanks for sharing your insights. For those that cannot make use of half precision, and haven’t got 32 GB of GPU memory, is there any other way to get past sub-division at 75K ?

yumi-cn commented 3 years ago

@yumi-cn Thanks for sharing your insights. For those that cannot make use of half precision, and haven’t got 32 GB of GPU memory, is there any other way to get past sub-division at 75K ?

Actually I do have some ideas about this, but I cannot share those to u now (maybe some paper ideas). And I find that the 25K sub-division and 40K iters can be usable for most scenes at general precision, if u don't need such high precision, dont need to take sub-divison at 75K.

MultiPath commented 3 years ago

Also, maybe the initial voxel size is too small. Maybe you can make it bigger.