Closed Thomahawkuru closed 4 months ago
Solved. Found the correct way of doing this in another issue: #124
Thanks to @Lucaweihs for the class ThorPositionTo2DFrameTranslator(object), which allows correct relation between pixels and coordinates to draw on the top down view.
Some context: Our aim is to create an interactive map from the top-down-image, where a user can click on a certain reachable position within the room to move the robot to. For this we need an accurate mapping from pixels of the top-down-image to room coordinates. The first step is to accurately draw the rooms bounding box using metadata such as 'scenebounds', 'center', 'corner_points, etc. and to determine a pixel-to-position factor from the room size and bounding box size.
To achieve the above I have coded a script that outputs the 2D top-down-image, for example in 1000x1000 pixels, where the largest dimension of the rooms size will correspond to 1000 pixels. Using the size information in the room metadata, I then determine the pixel-to-coordinates scale factor by dividing 1000 pixels by the largest room dimension. And using this factor I draw the a bounding box of the room size using the corner points and room center meta data, as well as the reachable positions onto the top-down-image.
Related Code
The issue For some of the procthor-10k houses, the bounding box and respective reachable positions align perfectly with the rooms top-down-image. But, for most of the rooms the bounding box, does not align and seems to have different dimensional relations compared to the top_down_image. It seems that sometimes the windows are included in the scene bounds, but other times they may not. Sometimes the difference is small, but other times the room size is way off. To correctly calculated the pixel-to-position factor for the reachable positions on the top-down-view, a correct bounding box is necessary.
Any ideas on why this meta-data information seems inconsistent? I would also help to have some information on how the scene bounds metadata property is determined for each of the houses. Are window-frames and door knobs included for example? Other suggestions on the archiving the relation of image to room positions in another way are of course also welcome.
Some examples rooms=[1327, 5127, 5364, 5878, 8413]
![8413](https://github.com/allenai/ai2thor/assets/28751075/6b84ebbf-829b-46f5-9065-22df044c7fe5)