Closed pdxmusic closed 2 months ago
Using the settings we used for the paper, a 150-image sequence requires about 40 GB of memory. However, it would probably be possible to substantially lower memory usage by doing the following:
If you're building on top of FlowMap, these things should be fairly easy to implement. If you're only interested in using FlowMap as an off-the-shelf tool, stay tuned for updates, since we'll hopefully be updating the repo with some of these improvements. See #4 for a few more details/advice on how to run FlowMap at a lower resolution.
Hello author, have you tested the performance differences when using this configuration?
Personally, I'm not quite sure what performance differences adjusting specific layers of the network might bring. Do you have any insights on this?
Hello author, have you tested the performance differences when using this configuration?
- Freezing the first part of the depth estimator and only fine-tuning later layers
Personally, I'm not quite sure what performance differences adjusting specific layers of the network might bring. Do you have any insights on this?
I try to use the “xxx --low_memory” to get the output, then I utilize the output file as the input of 3DGS. However, the performance is much lower than the paper indicates.
@leo-frank, I haven't tested this configuration yet. The thought is that earlier layers of the network mainly do feature extraction, while the later layers do depth-related computation. Since feature extraction should be universal, it might make sense to freeze these layers to save ~50% of the computation while hopefully not impacting the network's expressiveness too much.
@kk6398 There are a few things to note:
+experiment=low_memory
configuration. All of the results in the paper were generated without this flag. The initialization checkpoint was pre-trained using the original resolution, so it likely doesn't work as well. You could try running +experiment=[low_memory,ablation_random_initialization_long]
to get a sense of what a well-initialized low-resolution network will do. We found that a randomly initialized network that runs for a long time (20k steps) often performs almost as well as the full method with the initialization.
First, thanks for your work!
I was wondering if it was possible to know how much VRAM this code needs.
I tried running it on a dataset of 300 1920x1080 photos but even though I'm using a 24GB 4090 it goes OOM. I also tried reducing the dataset to 200 photos but the same problem occurred.