Closed GeonSeon closed 3 months ago
Sorry, it was just a question, but thank you for reading and answering. I want to know if there is a function to convert it, but if not, is it possible to add it?
Converting between 3D world coordinates and 2D image coordinates, and vice versa, is a common task in computer graphics, computer vision, and robotics. This process involves several steps and mathematical transformations. Here’s a simplified overview:
World Coordinates to Camera Coordinates: Transform the 3D world coordinates to camera coordinates using a view matrix (also known as a camera matrix). This step involves translating and rotating the world so that the camera is at the origin looking in the desired direction.
Camera Coordinates to Clip Coordinates: Project the camera coordinates to clip coordinates using a projection matrix. This matrix defines the camera's field of view, aspect ratio, and the near and far clipping planes. Perspective projection (which simulates how objects appear smaller as they get further away) and orthogonal projection (which preserves distances regardless of depth) are common.
Clip Coordinates to Normalized Device Coordinates (NDC): Convert clip coordinates to NDC by performing a perspective divide. For a point ((x, y, z, w)) in clip coordinates, its NDC are ((x/w, y/w, z/w)). NDC are in a normalized cube where each axis ranges from -1 to 1.
Normalized Device Coordinates to Viewport or Screen Coordinates: Finally, map the NDC to viewport or screen coordinates. This step involves scaling and translating the NDC so they fit into the screen's pixel grid. The viewport transformation typically also flips the y-axis because screen coordinates usually have the origin at the top left, while NDC have the origin at the center with the positive y-axis pointing upwards.
Viewport or Screen Coordinates to Normalized Device Coordinates: Invert the process used to map NDC to screen coordinates. Convert the 2D pixel coordinates back to NDC by undoing the scaling and translation applied during the viewport transformation.
Normalized Device Coordinates to Camera Coordinates: If you’re going backward from 2D to 3D, you usually aim to project a 2D point into 3D space along a ray. You can calculate the direction of this ray in camera space by using the inverse of the projection matrix on a point in NDC. However, since NDC lacks depth information (z-coordinate), you might only get a direction vector. To get specific coordinates in camera space, additional information or assumptions are needed (e.g., intersecting the ray with a known plane or object in the scene).
Camera Coordinates to World Coordinates: Apply the inverse of the view matrix transformation to the camera coordinates (or ray, if you're projecting into 3D space) to obtain the corresponding coordinates (or ray) in world space. This step transforms the point or ray from the camera's local space back into the global coordinate system of the 3D scene.
Graphics APIs like OpenGL or DirectX usually handle steps 1-3 of the forward conversion as part of their rendering pipeline. They provide mechanisms to define view and projection matrices, and automatically map to screen coordinates.
Computer Vision and Augmented Reality: These fields often involve the reverse process, projecting 2D points from image data into 3D space to understand the scene's geometry or augment it with virtual objects.
Understanding these processes deeply requires familiarity with linear algebra, particularly with transformations using matrices and vectors. Libraries and frameworks that work with 3D graphics provide functions and classes to handle these calculations, making it unnecessary to implement them from scratch in most cases.
Why this is an issue ? Could you please be more specific ?