OOM issue - Githubissues

pdxmusic commented 2 months ago

First, thanks for your work!

I was wondering if it was possible to know how much VRAM this code needs.

I tried running it on a dataset of 300 1920x1080 photos but even though I'm using a 24GB 4090 it goes OOM. I also tried reducing the dataset to 200 photos but the same problem occurred.

dcharatan commented 2 months ago

Using the settings we used for the paper, a 150-image sequence requires about 40 GB of memory. However, it would probably be possible to substantially lower memory usage by doing the following:

Running the depth estimator in FP16 instead of FP32
Running at a lower resolution
Freezing the first part of the depth estimator and only fine-tuning later layers
Not computing gradients on the whole sequence during every forward/backward pass

If you're building on top of FlowMap, these things should be fairly easy to implement. If you're only interested in using FlowMap as an off-the-shelf tool, stay tuned for updates, since we'll hopefully be updating the repo with some of these improvements. See #4 for a few more details/advice on how to run FlowMap at a lower resolution.

leo-frank commented 2 months ago

Hello author, have you tested the performance differences when using this configuration?

Freezing the first part of the depth estimator and only fine-tuning later layers

Personally, I'm not quite sure what performance differences adjusting specific layers of the network might bring. Do you have any insights on this?

kk6398 commented 2 months ago

Hello author, have you tested the performance differences when using this configuration?

Freezing the first part of the depth estimator and only fine-tuning later layers

Personally, I'm not quite sure what performance differences adjusting specific layers of the network might bring. Do you have any insights on this?

I try to use the “xxx --low_memory” to get the output, then I utilize the output file as the input of 3DGS. However, the performance is much lower than the paper indicates.

dcharatan commented 2 months ago

@leo-frank, I haven't tested this configuration yet. The thought is that earlier layers of the network mainly do feature extraction, while the later layers do depth-related computation. Since feature extraction should be universal, it might make sense to freeze these layers to save ~50% of the computation while hopefully not impacting the network's expressiveness too much.

@kk6398 There are a few things to note:

We haven't benchmarked the +experiment=low_memory configuration. All of the results in the paper were generated without this flag. The initialization checkpoint was pre-trained using the original resolution, so it likely doesn't work as well. You could try running +experiment=[low_memory,ablation_random_initialization_long] to get a sense of what a well-initialized low-resolution network will do. We found that a randomly initialized network that runs for a long time (20k steps) often performs almost as well as the full method with the initialization.
We benchmarked novel view synthesis using a version of 3DGS that backpropagates gradients into the camera positions. See the README for more details.

dcharatan / flowmap

OOM issue #13