facebookresearch / OrienterNet

Source Code for Paper "OrienterNet Visual Localization in 2D Public Maps with Neural Matching"

Other

463 stars 48 forks source link

Refactoring #49

Closed AlanSavio25 closed 7 months ago

AlanSavio25 commented 9 months ago

Hi @sarlinpe,

As a first step towards refactoring OrienterNet, I made some changes to the model outputs and adapted the visualization code to work with the new indexing:

I changed the yaw convention from north-clockwise to east-counterclockwise
Transitioned from uv indexing (origin at top left) to ij indexing (origin at bottom left). This also involved changing the map memory layout to support map[i, j] indexing.
Added a Transform2D class for better transformation handling. I created an instance of this class, pred['map_T_query'], to maybe later replace pred keys 'uvr_max', 'uv_max' and 'yaw_max'.

I couldn't get the circle inset to work with the new indexing, so I've temporarily disabled it in the refactored plotting function.

Would love to hear any suggestions you might have on any of the changes!

facebook-github-bot commented 9 months ago

Hi @AlanSavio25!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

facebook-github-bot commented 9 months ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

AlanSavio25 commented 8 months ago

I have now merged the refactored plotting function with the original one. The plotting function takes in a flag that tells it whether the pred and batch inputs are refactored.

Clarification about the map format

The refactored model outputs contain ij-indices (origin bottom-left, right: +i, up: +j) instead of the original uv-indices (origin top-left, right: +u, down: +v).
We would like to access the map raster's (i, j) coordinate using raster[i, j] (instead of the more confusing raster[j, i]). This requires the map's memory layout to be different from its spatial layout. In the refactor_model_output function, we rotate the map arrays by 90 deg clockwise to achieve this.
To visualize the map, the plotting function performs an axis swap and sets the matplotlib axis origin to "lower", which flips the map vertically. We can then conveniently provide matplotlib with our ij coordinates (without modifying them) for plotting. Matplotlib uses i to index the second axis and j to index the first.

Next Steps

Replace the dataloader's "roll_pitch_yaw" and "uv" keys with a "tile_T_query" (Transform2D instance) and "gravity" rotation matrix, which can be used for rectification.
Change the model output's map_T_query to tile_T_query (metric instead of pixels), then change the evaluation metrics to directly compare these transformations.

AlanSavio25 commented 8 months ago

Hi, I made some changes since my last comment. Here are a few details:

Dataloader
1. Extracted gravity rotation matrix and yaw angle directly from R_c2w instead of using Euler angles. This also fixes an issue where the yaw angle was incorrect because of how R_c2w was decomposed into Euler angles.
2. rectify_image now uses the gravity rotation matrix gcam_R_cam instead of roll and pitch Euler angles.
3. cfg.rectify_pitch is ignored now; do we still need this flag?
4. Added world_T_cam, tile_T_cam, and map_T_cam. We could completely remove map_T_cam later and use tile_T_cam instead (i.e. metric instead of pixel space).
5. Replaced uv, roll_pitch_yaw, uv_init, uv_gps keys from the data dict with map_T_cam, map_T_init, map_T_gps.
6. The map augmentations are a work in progress, so I disabled them temporarily. Need to be careful when updating the gt values
Transforms (wrappers.py)
1. Transform2D now stores the scalar angle instead of 2x2 rotmat.
2. Transform2D.from_Transform3D assumes the 3D rotmat is a camera pose with optical axis as Z, and so it extracts the angle between the world x axis and the camera z axis (yaw angle).
Metrics
1. Used (pred["tile_T_cam_max"].T @ data["tile_T_cam"]).magnitude() as in SNAP to simplify the metrics calculation.

AlanSavio25 commented 8 months ago

The eval results of the pre-trained model on this PR are exactly identical to those on the main branch (including GPS fusion errors). Here are the evaluation results on a recently downloaded MGL dataset:

{'xy_expectation_error': tensor(9.6098), 'xy_max_error': tensor(10.4407), 'xy_recall_2m': tensor(0.3564), 'xy_recall_5m': tensor(0.5960), 'yaw_max_error': tensor(22.3918), 'yaw_recall_2°': tensor(0.4066), 'yaw_recall_5°': tensor(0.7038), 'directional_error': tensor([6.5439, 6.5661]), 'xy_gps_error': tensor(4.5306), 'xy_fused_error': tensor(7.1373), 'yaw_fused_error': tensor(18.7499)}
[2024-04-04 20:43:37 maploc INFO] Recall xy_max_error: [14.02, 47.57, 59.6] at (1, 3, 5) m/°
[2024-04-04 20:43:37 maploc INFO] Recall xy_gps_error: [24.8, 56.46, 66.14] at (1, 3, 5) m/°
[2024-04-04 20:43:37 maploc INFO] Recall yaw_max_error: [21.82, 55.52, 70.38] at (1, 3, 5) m/°

They are also similar to the results on README.md, which are:

Recall xy_max_error: [14.37, 48.69, 61.7] at (1, 3, 5) m/°
Recall yaw_max_error: [20.95, 54.96, 70.17] at (1, 3, 5) m/°

Minor Note: When evaluating with has_gps=True, the GPS fusion metrics are exactly identical between main and refactoring when gaussian is set to True in fuse_gps(), but there is a 0.12m difference on the xy_fusion_error metric when gaussian=False. Not sure why exactly

AlanSavio25 commented 8 months ago

I trained main and refactoring for 24h each (~328k steps, batch size=1) on an RTX 2080, and the loss curves are similar. The main difference between the two is that in refactoring, the flip-augmented training images are correctly rectified.

The eval results after training are shown below:

main

All results: {'xy_expectation_error': tensor(18.0240), 'xy_max_error': tensor(21.0109), 'xy_recall_2m': tensor(0.1136), 'xy_recall_5m': tensor(0.2700), 'yaw_max_error': tensor(55.6853), 'yaw_recall_2°': tensor(0.1816), 'yaw_recall_5°': tensor(0.3569), 'directional_error': tensor([12.5002, 14.1251])}
[2024-04-05 13:51:24 maploc INFO] Recall xy_max_error: [4.03, 18.16, 27.0] at (1, 3, 5) m/°
[2024-04-05 13:51:24 maploc INFO] Recall yaw_max_error: [9.58, 25.69, 35.69] at (1, 3, 5) m/°

refactoring

All results: {'xy_expectation_error': tensor(17.3318), 'xy_max_error': tensor(20.2708), 'xy_recall_2m': tensor(0.1151), 'xy_recall_5m': tensor(0.2889), 'yaw_max_error': tensor(47.4610), 'yaw_recall_2°': tensor(0.1805), 'yaw_recall_5°': tensor(0.3972), 'directional_error': tensor([12.3601, 13.4812])}
[2024-04-05 13:29:57 maploc INFO] Recall xy_max_error: [3.4, 18.94, 28.89] at (1, 3, 5) m/°
[2024-04-05 13:29:57 maploc INFO] Recall yaw_max_error: [[9.84, 26.11, 39.72]] at (1, 3, 5) m/°