Closed DanielCoelho112 closed 2 years ago
Also because plugin in rgb on top would be more or less straightforward.
On Mon, Mar 21, 2022, 19:50 Daniel Coelho @.***> wrote:
I have not found any architecture that converts depth images to 6DoF poses, so I think we should give it a try.
— Reply to this email directly, view it on GitHub https://github.com/DanielCoelho112/localization_end_to_end/issues/22, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWTHVWRRZNIFJPM24RU7HLVBDHKBANCNFSM5RIX47DQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @miguelriemoliveira,
I believe you have everything to start this. I suggest you do all pipeline from the beginning. You could use the big dataset, but you wouldn't be able to run it on your computer... The readme is updated, so you can check the pipeline there. In any case, I'm free to help in any matter.
Hi @miguelriemoliveira and @pmdjdias,
I implemented a depth net following this link. I trained using our big dataset. I haven't performed any hyperparameter tuning, I just run one version using the dynamic loss and other with the beta loss. The results, once again, prove that the dynamic loss is the most suitable one. In all networks I have implemented, the ones with the dynamic losses have yielded better results.
First check the loss graph:
Clear case of overfitting. Now we know we have to implement some regularization techniques. The architecture from the paper had none, so this is the expected outcome.
Test set: position error: 0.45 m rotation error: 0.64 rads
Even with overfitting, this is the best result we have achieved so far.
Train set: position error: 0.16 m rotation error: 0.05 rads
In the train set, we can clearly see that the model was able to learn to infer the pose based on the depth image! Now it is just a matter of reduce the overfitting.
Check the visual results using the training set:
I think that based on this image we can conclude that our pipeline is validated! It looks like we don't have errors in any stage of the pipeline. Bear in mind that the 0.16m is influenced by some frames that must be eliminated. The position error of the first 5 frames is the following: 0.02,0.04,0.05,0.07, 0.10.
Uau!! Congratulations! Looks awesome
We can discuss later. Now I am going to the workshop of code editing ...
Hi @miguelriemoliveira and @pmdjdias,
I've finished the process of optimizing the depthnet and I think we achieved nice results.
I had to use several regularization techniques to reduce the overfitting, such as: dropout(0.8), batch normalization and L2 regularizer.
The loss function is the following:
It could be better, but this is the best I could get, I tested more than 20 versions.
Regarding the results, we have:
median position error : 0.18453 m median rotation error : 0.20363 rad
So from pointnet to depth net, we move from 0.50 m to 0.18 m!! These results are in line with the ones obtained in the state of the art.
I'm using median and not mean because the authors in the papers always use mean for outdoor environments and median for indoor environments.
Check the visual results in the test set:
In my opinion, the poses look really good!
I'm confident now that when adding the RGB information, we can improve even more our results.
Very nice work. I think we're under way for another nice paper.
On Sat, Apr 23, 2022, 13:15 Daniel Coelho @.***> wrote:
Hi @miguelriemoliveira https://github.com/miguelriemoliveira and @pmdjdias https://github.com/pmdjdias,
I've finished the process of optimizing the depthnet and I think we achieved nice results.
I had to use several regularization techniques to reduce the overfitting, such as: dropout(0.8), batch normalization and L2 regularizer.
The loss function is the following:
[image: losses] https://user-images.githubusercontent.com/73063947/164893862-412cccd9-c219-4dc1-97ed-30b88e7759bb.png
It could be better, but this is the best I could get, I tested more than 20 versions.
Regarding the results, we have:
median position error : 0.18453 m median rotation error : 0.20363 rad
So from pointnet to depth net, we move from 0.50 m to 0.18 m!! These results are in line with the ones obtained in the state of the art.
I'm using median and not mean because the authors in the papers always use mean for outdoor environments and median for indoor environments.
Check the visual results in the test set:
[image: results29] https://user-images.githubusercontent.com/73063947/164893998-5a18d176-6022-4f8d-a52d-cdc58670ff88.png
In my opinion, the poses look really good!
I'm confident now that when adding the RGB information, we can improve even more our results.
— Reply to this email directly, view it on GitHub https://github.com/DanielCoelho112/localization_end_to_end/issues/22#issuecomment-1107462089, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWTHVWEYYB4QCD2ZRTB6V3VGPSVTANCNFSM5RIX47DQ . You are receiving this because you were mentioned.Message ID: @.***>
I have not found any architecture that converts depth images to 6DoF poses, so I think we should give it a try.