NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.02k stars 284 forks source link

interchanged x and y coordsmin train2's createBeliefMap() #350

Closed wetoo-cando closed 5 months ago

wetoo-cando commented 6 months ago

@mintar Why are the x and y coordinates of the key points interchanged in this line?

        p = [point[numb_point][1],point[numb_point][0]]

https://github.com/NVlabs/Deep_Object_Pose/blob/1655459de50cfcbf01f7d24775f834cab400aa25/scripts/train2/utils_dope.py#L593

TontonTremblay commented 6 months ago

something to do with col vs row major in opencv.

TontonTremblay commented 6 months ago

In OpenCV, and more generally in many image processing and computer vision libraries, images are treated as matrices or 2D arrays. The convention here is that the first index refers to the row (vertical position) and the second index refers to the column (horizontal position). This is why you often see im[j, i] instead of im[i, j].

Let's break it down:

Matrix Representation: In the context of matrices, it's standard to refer to positions with row and column indices, where the row index comes first. This is the convention used in mathematics for matrices and also adopted in many programming languages for 2D arrays. In the case of images, each pixel's location is thus identified first by its row (which corresponds to the y coordinate in a Cartesian system) and then by its column (x coordinate).

Rows and Columns vs. X and Y: In a Cartesian coordinate system, we're used to x (horizontal axis) and then y (vertical axis). However, in matrix notation, which follows the row and column approach, it flips to row, column (or y, x in Cartesian terms). This is because when dealing with matrices, the emphasis is on moving down through rows first, then across columns, which aligns with how images are processed and stored in memory (row by row).

Practical Example: If you want to access the pixel at the Cartesian coordinate (x=10, y=20) in an image using OpenCV, you would access it using image[20, 10], since the 20 (y value) corresponds to the 21st row (considering 0 indexing), and the 10 (x value) corresponds to the 11th column.

chat gpt 4's answer. I think it is better than I could have produced.

wetoo-cando commented 5 months ago

Ok thanks @TontonTremblay I'll close the issue for now.