PRBonn / SHINE_mapping

🌟 SHINE-Mapping: Large-Scale 3D Mapping Using Sparse Hierarchical Implicit Neural Representations (ICRA 2023)
MIT License
441 stars 31 forks source link

Tuning shine #19

Closed johannes-graeter closed 1 year ago

johannes-graeter commented 1 year ago

Hi there,

first of all Kudos for this great piece of work! When we are trying to use it on custom data, we are struggeling a little with tuning the the algorithm. A. From the issues, I have the impression, that the configs supplied in the repo are not the same used for the paper, is that correct? Would it be possible to supply them, this would give great hints in order to see which buttons to push! B. Which are the top 3 parameters to tune for better reconstruction quality (quality=level of detail of the mesh while smoothness of surfaces is preserved) C. Which are the top 3 parameters to tune for scalabitity? (with 8GB of GPU Memory, I regularly segfault running out of memory) D. What compute hardware did you use for example for the KITTI example?

Thanks in advance!

Best Johannes

YuePanEdward commented 1 year ago

Thanks a lot for your interest in our work and your commitment to the project. A. I will upload the config file for reproducing the results reported in the paper after the vacation. B. The leaf node voxel size (the lower, the better), the marching cubes voxel size (the lower, the better) and the sigma_sigmoid_m value (depending on the noise level of the data) are the most important parameters for the reconstruction quality. The batch mode and the incremental mode with sliding window replay strategy generally have the better performance than the incremental mode regularisation strategy. C. You may decrease the number of the sample points, increase the voxel downsampling size and increase the leaf node voxel size of the octree for better scalability. D. We are using the Nvidia A4000 GPU with 16GB memory for our experiments.

johannes-graeter commented 1 year ago

Great thx for the fast feedback.

I have another question relating to that: thought the idea very appealing that we can train the decoder on one part of the data and then infer on the whole rest. Especially incrementally learning for each dataset seems very appealing.

I tried learning the decoder with shine_batch (call it run A), save the weights and load them and infering in replay (call it run B). I expected run A and be to give similar results, but this was not the case, do I have an understanding error here or is my configuration wrong and I could only infer the results (also on the whole dataset)

ljjTYJR commented 1 year ago

@johannes-graeter Did you use the saved model to run the same dataset? I think your requirement may not be suitable for this. Because for feature extraction module here, the input is the 3d position, not extracted embeddings, To use a frozen decoder, I think this paper may meet your requirements: https://github.com/ethz-asl/neuralblox

StarryN commented 1 year ago

Great thx for the fast feedback.

I have another question relating to that: thought the idea very appealing that we can train the decoder on one part of the data and then infer on the whole rest. Especially incrementally learning for each dataset seems very appealing.

I tried learning the decoder with shine_batch (call it run A), save the weights and load them and infering in replay (call it run B). I expected run A and be to give similar results, but this was not the case, do I have an understanding error here or is my configuration wrong and I could only infer the results (also on the whole dataset)

Hello Johannes, sorry for the late reply, and I didn't fully get your question. Could you explain more about "incrementally learning for each dataset"?
Regarding the decoder, I think it is not so important for mapping because the capacity of our map representation is large enough, and the feature field can always converge to some local minimums with a fixed MLP.

Here is a tiny experiment: I use the batch mode with 0-50 frames from the Maicity dataset. The result below is from the normal setting, where we train the map and the MLP jointly, After 1000 iterations:

We can see that the reconstruction is quite good already. And If I fixed the MLP after the initialization from the very beginning, means we don't optimize the MLP during training. After 1000 iterations:

It is worse than the upper one but still works. If we continue training this map to 3000 iterations: <img src="https://github.com/PRBonn/SHINE_mapping/assets/8968747/333c2f66-80d3-4cb3-9fef-cffb9e65f7e9" width="600" height="400" align="bottom" /> It can also achieve pretty good result.

So, from my perspective, an optimizable MLP can accelerate convergence as it can introduce more flexibility but it won't influent the final result too much.

YuePanEdward commented 1 year ago

Thanks a lot for your interest in our work and your commitment to the project. A. I will upload the config file for reproducing the results reported in the paper after the vacation. B. The leaf node voxel size (the lower, the better), the marching cubes voxel size (the lower, the better) and the sigma_sigmoid_m value (depending on the noise level of the data) are the most important parameters for the reconstruction quality. The batch mode and the incremental mode with sliding window replay strategy generally have the better performance than the incremental mode regularisation strategy. C. You may decrease the number of the sample points, increase the voxel downsampling size and increase the leaf node voxel size of the octree for better scalability. D. We are using the Nvidia A4000 GPU with 16GB memory for our experiments.

The config files and the reconstruction results have been uploaded.

johannes-graeter commented 1 year ago

Great thx for the fast feedback. I have another question relating to that: thought the idea very appealing that we can train the decoder on one part of the data and then infer on the whole rest. Especially incrementally learning for each dataset seems very appealing. I tried learning the decoder with shine_batch (call it run A), save the weights and load them and infering in replay (call it run B). I expected run A and be to give similar results, but this was not the case, do I have an understanding error here or is my configuration wrong and I could only infer the results (also on the whole dataset)

Hello Johannes, sorry for the late reply, and I didn't fully get your question. Could you explain more about "incrementally learning for each dataset"? Regarding the decoder, I think it is not so important for mapping because the capacity of our map representation is large enough, and the feature field can always converge to some local minimums with a fixed MLP.

Here is a tiny experiment: I use the batch mode with 0-50 frames from the Maicity dataset. The result below is from the normal setting, where we train the map and the MLP jointly, After 1000 iterations: We can see that the reconstruction is quite good already. And If I fixed the MLP after the initialization from the very beginning, means we don't optimize the MLP during training. After 1000 iterations: It is worse than the upper one but still works. If we continue training this map to 3000 iterations: It can also achieve pretty good result.

So, from my perspective, an optimizable MLP can accelerate convergence as it can introduce more flexibility but it won't influent the final result too much.

Thx for the insight! and the great explanation, I was wondering if we could profit from learning the decoder (but perhaps also the feature map) on different scenery. Taking KITTI f.e. how about training the feature encoder and decoder on 00 which is mostly urban like environment, taking the weights, load them and continue training on the Geiegerberg sequence f.e. (which has a lot more vegetation) or the highway sequence. Especially the encoder would then see different point distributions for surfaces (f.e. the lidar typical point distribution on vegetation is different then for a fassade). Could this increase reconstruction accuracy?

StarryN commented 1 year ago

Hi @johannes-graeter, Learning prior from different datasets would be a very cool idea. But actually, Shine-mapping has no encoder part. We randomly initialize map features on the grid corners and optimize them online by backpropagation. Like Nerf, what we do is just use neural representation to overfit the input data, so the map has no generalization. This recent work NKSR may be more suitable for your requirement. And also, as @ljjTYJR mentioned, neuralblox can achieve incremental mapping with a pre-trained encoder and a feature fusion network.