Keep in mind that 3D representation comes from ground truth instead of SfM!!!!

[x] Try out OnePose on given data
[ ] Check the code
[ ] What does the file feature_match_object_detector.py do?
[ ] Save keypoints extracted in OnePose via SuperPoint and show them
[x] Understand the training and testing dataset differences
[ ] Try it with custom dataset (mustard bottle)
[ ] Run the 2D-3D correspondance training network
[ ] Reason on the network
[ ] Generate 3D point estimation from dataset and sequences and check how accurate it is
[ ] Order the files in "pairs-covis.txt" to see if it fixes the "glitch"
[ ] Check .h5 files, especially how many points are extracted, the dimension and which features are considered
[ ] Check the .bin files

In order to then proceed with 2D-3D matching:

[ ] Check if it is possible to match (2D-2D) the same object between the videos generated from the virtual environment and the ones from the real camera
[x] Create a Dockerfile to move onto the Alienware/workstation
[ ] Test with GPU (alienware/workstation) to see true perfromances of the first part of the network
[ ] Generate more sequences in UE
[ ] Generate 3D models
[ ] Extract 3D features

OnePose tests

Follow the installation here Official github page

Due to not being able to use cuda I'm trying to use CPU to run this architecture, there are still a couple of problem so I will update this issue when they are solved.

Results from paper on GPU NVIDIA TITAN RTX GPU: 58.31 ms or 17 Hz

COLMAP SfM reconstruction

Examples of COLMAP's input video (there are 4 like this from which the SfM model is created)

https://github.com/user-attachments/assets/d6298808-97cb-46d0-a11d-235cdf3520d6

https://github.com/user-attachments/assets/e06cf32b-1c09-4fb5-a7b9-1edd2fef79c3

Resulting SfM SPARSE models:

Screenshot 2024-11-04 145235

Download link to get the .ply models in order to see the full visualization: models_SfM.zip

Results of 6DoF Pose Estimation

Pretrained GAT network on 0419-cookies2-others

To replicate see this link(inference)

Execution on CPU Intel Core i7

Original video sequence

https://github.com/user-attachments/assets/49615ef6-a3a8-4cdb-a494-73562286b592

SfM reconstruction of the model: about 40 minutes

Takes 3 video sequence and uses them to generate the model, the sequence are made up of 518, 166 and 515 frames.

Feature-matching-based 2D object detection (locate the scanned object on the query images): about 4 hours and 30 minutes

Number of reference views set to 15 This step analyzes the 561 frames of the testing sequence and detects the object's features in 2D.

video? image?

Pose estimation with visualization: 11 minutes and uses the "color_det" and "intrin_det" folders

6DoF detection!

https://github.com/user-attachments/assets/71129843-b188-41a6-aac6-d107ce2b4f03

Results in term of accuracy:

1 cm 1 degree metric	3 cm 3 degree metric	5 cm 5 degree metric
0.994661921708185	1.0	1.0

Pose estimation GT_box visualization: 10 minutes and uses the "color" and "intrin_ba" folders