Closed yangjob closed 3 years ago
The naive way, which is what we did, is to simply query every grid point under certain resolution. For example, say we set the resolution to be 64^3, then we just take all the 64^3 grid points in the voxel and input to the decoder.
Of course, this would cause many unnecessary queries, so a more efficient way is proposed in OccNet. And I believe nowadays there are many other works that try to improve the efficiency of implicit network.
The naive way, which is what we did, is to simply query every grid point under certain resolution. For example, say we set the resolution to be 64^3, then we just take all the 64^3 grid points in the voxel and input to the decoder.
Of course, this would cause many unnecessary queries, so a more efficient way is proposed in OccNet. And I believe nowadays there are many other works that try to improve the efficiency of implicit network.
ok, I got it.
In other words, after the bounding box is generated, it is gridded, and then the coordinates of each voxel are calculated separately and then sent to the decoder, right? Does the final result need to be consistent with the number of points during training?Thank you for your patient reply.
In other words, after the bounding box is generated, it is gridded, and then the coordinates of each voxel are calculated separately and then sent to the decoder, right?
Partly right. Each part is generated separately in its own local space by the decoder using 64^3 points, so every part is within a 64^3 box. Then we use the predicted bounding box to transform each part to its correct position in the global space.
Does the final result need to be consistent with the number of points during training?
Not necessarily. The part autoencoder(implicit decoder) is trained under 64^3 resolution, but you can use any resolution in testing time, because point coordinates are normalized into [0, 1] before putting into network. This is essentially a good property for implicit neural representations, you can refer to DeepSDF, IM-NET, OccNET, etc.
In other words, after the bounding box is generated, it is gridded, and then the coordinates of each voxel are calculated separately and then sent to the decoder, right?
Partly right. Each part is generated separately in its own local space by the decoder using 64^3 points, so every part is within a 64^3 box. Then we use the predicted bounding box to transform each part to its correct position in the global space.
Does the final result need to be consistent with the number of points during training?
Not necessarily. The part autoencoder(implicit decoder) is trained under 64^3 resolution, but you can use any resolution in testing time, because point coordinates are normalized into [0, 1] before putting into network. This is essentially a good property for implicit neural representations, you can refer to DeepSDF, IM-NET, OccNET, etc.
ok,thank you very much
Hello, I am a newbie in this direction. Your article is very well written. I have a question about the generation of results, that is, about the implicit decoder, which is just a classifier, so how are the coordinates of the real points generated? thanks