Point cloud and 2D image pixel correspondence

Damysusd commented 1 year ago

Hello,

I am currently working with a dense point cloud generated by MVS and I am trying to find the pixel/image coordinate that corresponds to a specific point in the point cloud. Please note that I am aware that it is possible to do this with OpenMVG by using the sfm_data.bin file, but my specific case requires me to work with a dense point cloud rather than a sparse point cloud.

If anyone has experience with finding the pixel/image coordinate that corresponds to a specific point in a dense point cloud or any suggestions for how to approach this problem, I would greatly appreciate any guidance or advice that you can offer.

Thank you in advance!

cdcseacave commented 1 year ago

very simple: just use Camera::TransformPointW2I() to project the 3D point in the dense point cloud to the desired image

SKKU-Marine commented 1 year ago

Sorry. Can you tell me where can I use Camera::TransformPointW2I()? Thank you very much.

bgramaje commented 1 year ago

any updates regarding this? Im also interested over these functionality

cdcseacave commented 1 year ago

I m afraid I do not understand the question. All you have to do is to project the 3D points in the camera to find its coordinates

bgramaje commented 1 year ago

Hi, I was able to do a 3d reconstrucion and a point cloud of some comms towers. I have the set of images used to make the point cloud. With these images I want to find anomalies over this comms towers. Therefore I need to delimit over on each image if an anomally is find and therefore, on the point cloud add a marker to it. I am building an app using react.three. So i want to know if i delimit an image for an anomaly how can i map that 2d into the 3d space for placing makerpoints. Thanks

cdcseacave commented 1 year ago

you will need the mesh reconstruction; once you have the mesh, project the mesh into that image and get the depth-map; that will tell you which is the depth of the pixel which you want to back-project in 3D, and then just call camera.TransformPointI2W()

bgramaje commented 1 year ago

do i need to do the mesh step? with the point cloud reconstruction will not be valid?

bgramaje commented 1 year ago

btw, could you tell me how do I call that method? sould I include de Camera.h file into an own script or is there an specific way to do it? Thanks

cdcseacave commented 1 year ago

There is no robust way of rendering a depth map from a point cloud

On Tue, 12 Sep 2023 at 09:54 Borja Albert Gramaje @.***> wrote:

btw, could you tell me how do I call that method? sould I include de Camera.h file into an own script or is there an specific way to do it? Thanks

— Reply to this email directly, view it on GitHub https://github.com/cdcseacave/openMVS/issues/976#issuecomment-1715106647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVMH3R5VDUV72E22FH6OATX2ABJJANCNFSM6AAAAAAXFRVTBA . You are receiving this because you commented.Message ID: @.***>

bgramaje commented 1 year ago

Well, your densifypointcloud script does this, and there is a flag (remove-dmaps) to remove this depth maps. Is there any python script to import this 'DMAP' format file? I have some troubles reading it with python. Thanks

ghost commented 1 year ago

from argparse import ArgumentParser
import numpy as np
import pyvips
import struct

def loadDMAP(dmap_path):
  with open(dmap_path, 'rb') as dmap:
    file_type = dmap.read(2).decode()
    content_type = struct.unpack('B', dmap.read(1))[0]
    reserve = struct.unpack('B', dmap.read(1))[0]

    has_depth = content_type > 0
    has_normal = content_type in [3, 7, 11, 15]
    has_conf = content_type in [5, 7, 13, 15]
    has_views = content_type in [9, 11, 13, 15]

    image_width, image_height = struct.unpack('2I', dmap.read(8))
    depth_width, depth_height = struct.unpack('2I', dmap.read(8))

    if (file_type != 'DR' or has_depth == False or depth_width <= 0 or depth_height <= 0 or image_width < depth_width or image_height < depth_height):
      print('error: opening file \'%s\' for reading depth-data' % dmap_path)
      return None

    depth_min, depth_max = struct.unpack('2f', dmap.read(8))

    file_name_length = struct.unpack('H', dmap.read(2))[0]
    file_name = dmap.read(file_name_length).decode()

    view_ids_length = struct.unpack('I', dmap.read(4))[0]
    view_ids = np.asarray(struct.unpack('%dI' % view_ids_length, dmap.read(4 * view_ids_length)))

    K = np.asarray(struct.unpack('9d', dmap.read(72))).reshape(3, 3)
    R = np.asarray(struct.unpack('9d', dmap.read(72))).reshape(3, 3)
    C = np.asarray(struct.unpack('3d', dmap.read(24)))

    depth_length = depth_width * depth_height
    depth_map = np.asarray(struct.unpack('%df' % depth_length, dmap.read(4 * depth_length))).reshape(depth_height, depth_width)
    normal_map = np.asarray(struct.unpack('%df' % depth_length * 3, dmap.read(4 * depth_length * 3))).reshape(depth_height, depth_width, 3) if has_normal else np.asarray([])
    confidence_map = np.asarray(struct.unpack('%df' % depth_length, dmap.read(4 * depth_length))).reshape(depth_height, depth_width) if has_conf else np.asarray([])
    views_map = np.asarray(struct.unpack('%dB' % depth_length * 4, dmap.read(depth_length * 4))).reshape(depth_height, depth_width, 4) if has_views else np.asarray([])

  data = {
    'image_width': image_width,
    'image_height': image_height,
    'depth_width': depth_width,
    'depth_height': depth_height,
    'depth_min': depth_min,
    'depth_max': depth_max,
    'file_name': file_name,
    'view_ids': view_ids,
    'K': K,
    'R': R,
    'C': C,
    'depth_map': depth_map,
    'normal_map': normal_map,
    'confidence_map': confidence_map,
    'views_map': views_map
  }

  return data

def main():
  parser = ArgumentParser()
  parser.add_argument('-i', '--input', type=str, required=True, help='path to the depth map')
  args = parser.parse_args()

  dmap = loadDMAP(args.input)

  pyvips.Image.new_from_array(np.uint8(dmap['depth_map'] * (1 / dmap['depth_max']) * 255)).write_to_file('depth_map.png')
  if dmap['normal_map'].any():
    pyvips.Image.new_from_array(np.uint8((dmap['normal_map'] @ -dmap['R'] + 1) * 0.5 * 255)).write_to_file('normal_map.png')
  if dmap['confidence_map'].any():
    pyvips.Image.new_from_array(np.uint8(dmap['confidence_map'] * (1 / dmap['confidence_map'].max()) * 255)).write_to_file('confidence_map.png')

if __name__ == '__main__':
  main()

ghost commented 1 year ago

A few ideas.

Create a mesh from the point cloud> select pixel on image> ray cast from the current pose to the mesh> place marker at the intersection> from the marker, ray cast to the other poses to prevent clipping> add marker info to the poses that were successful.
If using a point cloud, ray cast with ball radius.
Depth maps can contain noise that would have been filtered out when fusing.
Project mesh to camera for depth values> get depth at pixel selection> calculate position of marker from depth and pose information.

cdcseacave commented 1 year ago

@4CJ7T wow, you wrote a DMAP reader in python, this is so great!!! can you pls write one for the MVS file as well? and pls make a PR with them, many people will find this helpful

ghost commented 1 year ago

I tried for the MVS file but there is gibberish after the file name.

cdcseacave commented 1 year ago

this is the C++ code that writes the MVS file, specialized for strings, and here is the order

can you pls see what is happening with the helps of this code?

ghost commented 11 months ago

Thank you, that helped figure out the rest of the MVS interface archive.

ghost commented 11 months ago

How is this?

from argparse import ArgumentParser
import json
import numpy as np
import os

def loadMVSInterface(archive_path):
  with open(archive_path, 'rb') as mvs:
    file_type = mvs.read(4).decode()
    version = np.frombuffer(mvs.read(4), dtype=np.dtype('I')).tolist()[0]
    reserve = np.frombuffer(mvs.read(4), dtype=np.dtype('I'))

    if file_type != 'MVSI':
      print('error: opening file \'{}\''.format(archive_path))
      return None

    data = {
      'project_stream': file_type,
      'project_stream_version': version,
      'platforms': [],
      'images': [],
      'vertices': [],
      'vertices_normal': [],
      'vertices_color': []
    }

    platforms_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
    for platform_index in range(platforms_size):
      platform_name_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      platform_name = mvs.read(platform_name_size).decode()
      data['platforms'].append({'name': platform_name, 'cameras': []})
      cameras_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      for camera_index in range(cameras_size):
        camera_name_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
        camera_name = mvs.read(camera_name_size).decode()
        data['platforms'][platform_index]['cameras'].append({'name': camera_name})
        if version > 3:
          band_name_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
          band_name = mvs.read(band_name_size).decode()
          data['platforms'][platform_index]['cameras'][camera_index].update({'band_name': band_name})
        if version > 0:
          width, height = np.frombuffer(mvs.read(8), dtype=np.dtype('I')).tolist()
          data['platforms'][platform_index]['cameras'][camera_index].update({'width': width, 'height': height})
        K = np.asarray(np.frombuffer(mvs.read(72), dtype=np.dtype('d'))).reshape(3, 3).tolist()
        data['platforms'][platform_index]['cameras'][camera_index].update({'K': K, 'poses': []})
        identity_matrix = np.asarray(np.frombuffer(mvs.read(96), dtype=np.dtype('d'))).reshape(4, 3)
        poses_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
        for _ in range(poses_size):
          R = np.asarray(np.frombuffer(mvs.read(72), dtype=np.dtype('d'))).reshape(3, 3).tolist()
          C = np.asarray(np.frombuffer(mvs.read(24), dtype=np.dtype('d'))).tolist()
          data['platforms'][platform_index]['cameras'][camera_index]['poses'].append({'R': R, 'C': C})

    images_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
    for image_index in range(images_size):
      name_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      name = mvs.read(name_size).decode()
      data['images'].append({'name': name})
      if version > 4:
        mask_name_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
        mask_name = mvs.read(mask_name_size).decode()
        data['images'][image_index].update({'mask_name': mask_name})
      platform_id, camera_id, pose_id = np.frombuffer(mvs.read(12), dtype=np.dtype('I')).tolist()
      data['images'][image_index].update({'platform_id': platform_id, 'camera_id': camera_id, 'pose_id': pose_id})
      if version > 2:
        id = np.frombuffer(mvs.read(4), dtype=np.dtype('I')).tolist()[0]
        data['images'][image_index].update({'id': id})
      if version > 6:
        min_depth, avg_depth, max_depth = np.frombuffer(mvs.read(12), dtype=np.dtype('f')).tolist()
        data['images'][image_index].update({'min_depth': min_depth, 'avg_depth': avg_depth, 'max_depth': max_depth, 'view_scores': []})
        view_score_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
        for _ in range(view_score_size):
          view_score_id, view_score_points = np.frombuffer(mvs.read(8), dtype=np.dtype('I')).tolist()
          view_score_scale, view_score_angle, view_score_area, view_score_score = np.frombuffer(mvs.read(16), dtype=np.dtype('f')).tolist()
          data['images'][image_index]['view_scores'].append({'id': view_score_id, 'points': view_score_points, 'scale': view_score_scale, 'angle': view_score_angle, 'area': view_score_area, 'score': view_score_score})

    vertices_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
    for vertex_index in range(vertices_size):
      X = np.frombuffer(mvs.read(12), dtype=np.dtype('f')).tolist()
      data['vertices'].append({'X': X, 'views': []})
      views_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      for _ in range(views_size):
        image_id = np.frombuffer(mvs.read(4), dtype=np.dtype('I')).tolist()[0]
        confidence = np.frombuffer(mvs.read(4), dtype=np.dtype('f')).tolist()[0]
        data['vertices'][vertex_index]['views'].append({'image_id': image_id, 'confidence': confidence})

    vertices_normal_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
    for _ in range(vertices_normal_size):
      normal = np.frombuffer(mvs.read(12), dtype=np.dtype('f')).tolist()
      data['vertices_normal'].append(normal)

    vertices_color_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
    for _ in range(vertices_color_size):
      color = np.frombuffer(mvs.read(3), dtype=np.dtype('B')).tolist()
      data['vertices_color'].append(color)

    if version > 0:
      data.update({'lines': [], 'lines_normal': [], 'lines_color': []})
      lines_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      for line_index in range(lines_size):
        pt1 = np.frombuffer(mvs.read(12), dtype=np.dtype('f')).tolist()
        pt2 = np.frombuffer(mvs.read(12), dtype=np.dtype('f')).tolist()
        data['lines'].append({'pt1': pt1, 'pt2': pt2, 'views': []})
        views_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
        for _ in range(views_size):
          image_id = np.frombuffer(mvs.read(4), dtype=np.dtype('I')).tolist()[0]
          confidence = np.frombuffer(mvs.read(4), dtype=np.dtype('f')).tolist()[0]
          data['lines'][line_index]['views'].append({'image_id': image_id, 'confidence': confidence})
      lines_normal_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      for _ in range(lines_normal_size):
        normal = np.frombuffer(mvs.read(12), dtype=np.dtype('f')).tolist()
        data['lines_normal'].append(normal)
      lines_color_size = np.frombuffer(mvs.read(8), dtype=np.dtype('Q'))[0]
      for _ in range(lines_color_size):
        color = np.frombuffer(mvs.read(3), dtype=np.dtype('B')).tolist()
        data['lines_color'].append(color)
      if version > 1:
        transform = np.frombuffer(mvs.read(128), dtype=np.dtype('d')).reshape(4, 4).tolist()
        data.update({'transform': transform})
        if version > 5:
          rot = np.frombuffer(mvs.read(72), dtype=np.dtype('d')).reshape(3, 3).tolist()
          pt_min = np.frombuffer(mvs.read(24), dtype=np.dtype('d')).tolist()
          pt_max = np.frombuffer(mvs.read(24), dtype=np.dtype('d')).tolist()
          data.update({'obb': {'rot': rot, 'pt_min': pt_min, 'pt_max': pt_max}})

  return data

def main():
  parser = ArgumentParser()
  parser.add_argument('-i', '--input', type=str, required=True, help='Path to the MVS interface archive')
  parser.add_argument('-o', '--output', type=str, required=True, help='Path to the output json file')
  args = parser.parse_args()

  data = loadMVSInterface(args.input)

  for platform_index in range(len(data['platforms'])):
    for camera_index in range(len(data['platforms'][platform_index]['cameras'])):
      camera = data['platforms'][platform_index]['cameras'][camera_index]
      image_max = max(camera['width'], camera['height'])
      fx = camera['K'][0][0] / image_max
      fy = camera['K'][1][1] / image_max
      poses_size = len(camera['poses'])
      print('Camera model loaded: platform {}; camera {}; f {:.3f}x{:.3f}; poses {}'.format(platform_index, camera_index, fx, fy, poses_size))

  os.makedirs(os.path.dirname(args.output), exist_ok = True)

  with open(args.output, 'w') as file:
    json.dump(data, file, indent=2)

if __name__ == '__main__':
  main()

cdcseacave commented 11 months ago

wow, great work @4CJ7T , thank you for helping again

cdcseacave commented 11 months ago

can u pls open a PR with this code as u did for the DMAP code?

cdcseacave / openMVS

Point cloud and 2D image pixel correspondence #976