Keep in mind that 3D representation comes from ground truth instead of SfM!!!!
[x] Try out OnePose on given data
[ ] Check the code
[ ] What does the file feature_match_object_detector.py do?
[ ] Save keypoints extracted in OnePose via SuperPoint and show them
[x] Understand the training and testing dataset differences
[ ] Try it with custom dataset (mustard bottle)
[ ] Run the 2D-3D correspondance training network
[ ] Reason on the network
[ ] Generate 3D point estimation from dataset and sequences and check how accurate it is
[ ] Order the files in "pairs-covis.txt" to see if it fixes the "glitch"
[ ] Check .h5 files, especially how many points are extracted, the dimension and which features are considered
[ ] Check the .bin files
In order to then proceed with 2D-3D matching:
[ ] Check if it is possible to match (2D-2D) the same object between the videos generated from the virtual environment and the ones from the real camera
[x] Create a Dockerfile to move onto the Alienware/workstation
[ ] Test with GPU (alienware/workstation) to see true perfromances of the first part of the network
Due to not being able to use cuda I'm trying to use CPU to run this architecture, there are still a couple of problem so I will update this issue when they are solved.
Results from paper on GPU NVIDIA TITAN RTX GPU: 58.31 ms or 17 Hz
COLMAP SfM reconstruction
Examples of COLMAP's input video (there are 4 like this from which the SfM model is created)
Keep in mind that 3D representation comes from ground truth instead of SfM!!!!
In order to then proceed with 2D-3D matching:
OnePose tests
Follow the installation here Official github page
Due to not being able to use cuda I'm trying to use CPU to run this architecture, there are still a couple of problem so I will update this issue when they are solved.
Results from paper on GPU NVIDIA TITAN RTX GPU: 58.31 ms or 17 Hz
COLMAP SfM reconstruction
Examples of COLMAP's input video (there are 4 like this from which the SfM model is created)
https://github.com/user-attachments/assets/d6298808-97cb-46d0-a11d-235cdf3520d6
https://github.com/user-attachments/assets/e06cf32b-1c09-4fb5-a7b9-1edd2fef79c3
Resulting SfM SPARSE models:
Download link to get the .ply models in order to see the full visualization: models_SfM.zip
Results of 6DoF Pose Estimation
Pretrained GAT network on 0419-cookies2-others
To replicate see this link(inference)
Execution on CPU Intel Core i7
Original video sequence
https://github.com/user-attachments/assets/49615ef6-a3a8-4cdb-a494-73562286b592
SfM reconstruction of the model: about 40 minutes
Takes 3 video sequence and uses them to generate the model, the sequence are made up of 518, 166 and 515 frames.
Feature-matching-based 2D object detection (locate the scanned object on the query images): about 4 hours and 30 minutes
Number of reference views set to 15 This step analyzes the 561 frames of the testing sequence and detects the object's features in 2D.
video? image?
Pose estimation with visualization: 11 minutes and uses the "color_det" and "intrin_det" folders
6DoF detection!
https://github.com/user-attachments/assets/71129843-b188-41a6-aac6-d107ce2b4f03
Results in term of accuracy:
Pose estimation GT_box visualization: 10 minutes and uses the "color" and "intrin_ba" folders
6DoF detection!
https://github.com/user-attachments/assets/c0649b18-6dd7-4152-8c4e-d4c46b0f34a5
Results in term of accuracy:
Note: every time I get the same results as the GAT model is the same across every repetition, will check when I run it with custom data
Note: the twitching in the video is due to incorrect images indeces but i yet don't understand where the problem lies