What are segments? - Githubissues

bayraktare commented 4 years ago

Hi @WaldJohannaU thanks for sharing this great dataset with us.

I just want know what are the "segments": [1,4,3] what you gave in *semseg.json files?

Thanks for your time and I am waiting to hearing from you.

WaldJohannaU commented 4 years ago

Hi @bayraktare,

we used an over-segmentation (Efficient Graph-Based Image Segmentation) of the scans when annotating our 3D models; an instance consists of multiple segments. This over-segmentation is also used in ScanNet (see here). If you want to read it you will also need mesh.refined.0.010000.segs.json (which corresponds to <scanId>_vh_clean_2.0.010000.segs.json in ScanNet). If you simply want to read the instance segmentation I recommend only reading the label and objectId of the segGroups in *semseg.json it maps to the objectId in labels.instances.annotated.ply.

bayraktare commented 4 years ago

Thanks for replying @WaldJohannaU Just a little and quick question on your answer: May you also write the corresponding files to; 1) <scanId>__vh_clean.aggregation.json (I think this is *semseg.json, isn't it?) 2) to obtain labels.instances.annotated.xyz from labels.instances.annotated.ply what should I do?

In summary I am trying to obtain ground-truths from your dataset. The workflow of my code as follows: 1) for the scene read the mesh.refined.0.010000.segs.json, labels.instances.annotated.xyz and *semseg.json 2) from *semseg.json get objectId and labels according to the segments and append them 3) read poses per frame and take the inverse. find the boolean array which is true for the points which are behind the camera and normalize the homogeneous point: [x y 1] 4) get the points related to the objects, from the indices and obtain bounding boxes 5) eliminate the bounding-boxes if it is outside the image or if the object is small on the image: (if x1<0 or y1<0 or wi<x2 or hi<y2 or int(x2-x1)<5 or int(y2-y1)<5: continue)

Until 5th step, I get many outputs but if I apply the 5 then most of them are removed. So it does not generate any outputs for most of the scenes. Even if it generates the results, it has only several lines for the whole sequence. When I check the values before 5th step, I see negative values, or very big values for bounding-boxes. Where is the error here, can you see it? Or do you have any better idea to etrieve the ground-truths for object ids, labels and bounding-boxes?

Thanks for your time and this great work.

WaldJohannaU commented 4 years ago

Yes, the corresponding file to semseg.json is <scanId>__vh_clean.aggregation.json in ScanNet.

_vh_clean_2.labels.ply and labels.instances.annotated.ply store slightly different data, to get the semantic labels I recommend you first read labels.instances.annotated.ply - you could easily do this f.e. in python:

file = open('labels.instances.annotated.ply', 'rb')
plydata = PlyData.read(file)
labels = plydata['vertex']['objectId']

objectId gives you an instance ID per vertex (usually a low number f.e. 34 or 42); the ID is scene specific (so 1 could be a chair in one scene but a table in another).

The ID corresponds to objectId in semseg.json; there you also have the class label mapping; that means you can map objectId 42 to class label box in this particular scene:

 "segGroups": [
    {
      "id": 42,
      "objectId": 42,
      "label": "box",
  ...
    {
      "id": 34,
      "objectId": 34,
      "label": "chair",

with open('semseg.json', 'r') as read_file:
  data = json.load(read_file)
  for segGroups in data['segGroups']:
      print(segGroups["objectId"], segGroups["label"])

Since we have 534 unique class labels, we released a class mapping to NYU40 / Eigen (chair is class 5, same as armchair and dining chair): https://github.com/WaldJohannaU/3RScan/blob/master/data/mapping.txt

I'm not sure what you are trying to do exactly but if you want to get 2D bounding boxes you could render the objectID using OpenGL (which would replace the second half of your 3. step) and do the above mapping in 2D.

Please note, you don't need to read the mesh.refined.0.010000.segs.json.

I hope that helps.

bayraktare commented 4 years ago

Hi @WaldJohannaU Thank you very much for your detailed explanation. Yes, I am trying to obtain ground-truths for performance evaluation of my algorithm. I have got it for ScanNet but not for your dataset yet, unfortunately. For example a line of the file I try to get should look like this: /path/... objectID ObjectClass Occlusion x1 y1 x2 y2

I am giving my whole code to here and if you may find some time and answer me what are the mistakes, I will appreciate. Then, maybe we can put it in your repo showing how to generate ground-truths for others would be good as well.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 21 18:11:45 2020

@author: bayraktare
"""
import json, glob, csv, sys,os, argparse, meshio
import numpy as np

def get_intrinsic_color(fn):
    k = open(fn, 'r')
    kk = k.readlines()
#    print(kk)
    K = []    
    for i in range(len(kk)):
       K.append(kk[i].split(' '))

    mt = np.asarray(K[7][2:-1], dtype='float')    
    mat = np.reshape(mt,(4,4))
    wi = int(K[2][2][:-1]) # colorwidth
    hi = int(K[3][2][:-1]) # colorheight
    return (mat,wi,hi)

def get_pose(fn):
  return np.loadtxt(open(fn, "rb"), delimiter=" ")

def get_full_pc(fn):
  #return np.loadtxt(open(fn, "rb"), delimiter=" ")
  return np.genfromtxt(open(fn, "rb"), delimiter=" ")

def frame_num_from_name(filename):# when name is .../frame-000000.pose.txt
  return int(filename.split('/')[-1].split('-')[1].split('.')[0])

def getOcclusion(camref, behind, full_pc2d, pc3di, bbx):
  (oid,l,x1,y1,x2,y2)=bbx
  inter=5 # resolution of the grid
  d=0.1# distance beyond we consider that a point does not belong anymore to an object
  xs=range(int(x1),int(x2),int((x2-x1)/inter))
  ys=range(int(y1),int(y2),int((y2-y1)/inter))
  occlus_count=0
  obj_idx=np.zeros((1, len(behind)), dtype=bool)
  obj_idx[[0],[pc3di]]=True
  oclus=0
  outsideobject=0
  for i in range(len(xs)-1):
    for j in range(len(ys)-1):
      all_idx=np.logical_and( xs[i]<full_pc2d[0] , full_pc2d[0]<xs[i+1])
      all_idx=np.logical_and(all_idx, ys[j]<full_pc2d[1])
      all_idx=np.logical_and( all_idx, full_pc2d[1]<ys[j+1])
      all_idx=np.logical_and(all_idx, np.logical_not(behind) ) # Collecting point cloud inside this rectangle and not begind the camera
      o_idx=np.logical_and( all_idx, obj_idx) # selecting the 2D points which are inside the rectangle and belong to the object
      all_idx=np.logical_and(all_idx,np.logical_not(obj_idx))# removing the points linked to the object
      if not o_idx.any() or not all_idx.any():
        if not o_idx.any():
          outsideobject+=1 # this part of the bounding box does not contain object points, so we don't count it
        continue # no object points for this rectangle, or no point who does not belong to the object
      o_depth=np.min(np.linalg.norm(camref[0:3,o_idx[0,:]],axis=0))/3 #mean depth of the object for this rectangle
      if sum((o_depth-np.linalg.norm(camref[0:3,all_idx[0,:]],axis=0)/3)>d)>0:# if any point not from the object is in front (i.e. has a lower depth) of the object and make an occlusion ...
        oclus+=1 # ... we count this part as occluded
  if ((len(xs)-1)*(len(ys)-1)-outsideobject)==0:
    print('something is wrong: no points projected on the bbx')
    return 1
  return float(oclus)/((len(xs)-1)*(len(ys)-1)-outsideobject)

# main code
input_sequences = glob.glob('/home/.../3rscan/sequence/*')
scene_list = [i.split('/')[-1] for i in input_sequences]
datadir = '/home/.../3rscan/sequence' 
outdir = '/home/.../3rscan/2dgtwithbboxes' 
# CONVERT PLY TO XYZ
for ply in input_ply:
    if os.path.isfile(ply.split('ply')[0]+'xyz'):
        print('file exists, skipping', ply.split('ply')[0]+'xyz')
        continue
    d = meshio.read(ply)
    np.savetxt(ply.split('ply')[0]+'xyz', d.points, fmt='%1.6f')
c = 0
for scene in scene_list:
    if os.path.isfile(outdir+'/'+scene+'.2dgt'): #seq.split('/')[-1]+'.2dgt'):
        print('file exists, skipping', scene)  #seq.split('/')[-1])
        continue
 # read *.semseg.json
    ag_f = datadir+'/'+scene+'/semseg.json' # seq+'/semseg.json' 
    if not os.path.isfile(ag_f):
        print('no *semseg.json file found for the scene', ag_f)
        continue
    fp = open(ag_f)
    aggreg = json.load(fp)
    fp.close()

    objs = {} # will contain the objects for this scene
    seg_f = datadir+'/'+scene+'/mesh.refined.0.010000.segs.json' # seq+'/mesh.refined.0.010000.segs.json'
    if not os.path.isfile(seg_f):
        print('no mesh.refined.0.010000.segs.json file found for the scene', seg_f)
        continue
    f = open(seg_f)
    segs = json.load(f)
    f.close

     # read *.xyz and append them into 3D point cloud 
    xyz_f = datadir+'/'+scene+'/labels.instances.annotated.xyz' # seq+'/labels.instances.annotated.xyz'
    if not os.path.isfile(xyz_f):
        print('no XYZ file found for the scene', xyz_f)
        continue
    pc3d=get_full_pc(xyz_f) #getting the full 3D point cloud
    for po in aggreg['segGroups']: # for each object in the json file
        objs[po['objectId']]=[ po['label']]# {0: 'window'}
        pc3di=[]
        for segid in po['segments']:#for each segment of the object o
            pc3di+=[x for (x,y) in enumerate(segs['segIndices']) if y==segid ] # collecting the 3D points associated to the segment segid
        objs[po['objectId']].append(pc3di)
    print('Loading 3D point cloud done, number of objects:', len(objs.keys()))
    (m_calibrationColorIntrinsic,wi,hi)=get_intrinsic_color(datadir+'/'+scene+'/sequence/_info.txt') # seq+'/sequence/_info.txt')#getting intrinsic parameter
    print('done1')
    obj_by_img={}# dictionnary that will contain the bounding boxes of the objects appearing in each image.
    for pose in glob.glob(datadir+'/'+scene+'/sequence/*pose.txt'):
        cam2world=get_pose(pose)# getting the inverse of the extrinsic parameters from the frame_0XXXX.pose.txt
        if np.logical_not(np.isfinite(cam2world)).any() or  np.isnan(cam2world).any() or cam2world.shape[0]==0:
            print('erroneous camera value, skipping', cam2world)
            continue #the values of the camera pose are wrong, so we skip

        world2cam=np.linalg.inv(cam2world)#getting the actual parameters
        obj_by_img[frame_num_from_name(pose)]=[pose.split('/')[-1].split('.')[0],[]] # will contain a list with two elements the name of the  frame and the list of objects bbxs 
        camref=np.dot(world2cam,np.vstack((pc3d.transpose(),np.ones((1,pc3d.shape[0])))))
        behind=camref[2]<=0 # boolean array which is true for the points which are behind the camera
        full_pc2d=np.dot(m_calibrationColorIntrinsic, camref)
        full_pc2d=np.divide(full_pc2d,np.tile(full_pc2d[2],(4,1))) #normalising the homogeneous point: [x y 1]
        for oid, (l, pc3di) in objs.items():
            if behind[pc3di].any():# if any of the object point is behind the camera, we skip
                continue
            rows=np.array( [len(pc3di)*[0], len(pc3di)*[1]] )
            cols=np.array([pc3di,pc3di])
            pc2d=full_pc2d[rows,cols]   # get the points related to the objects, from the indices
            (x1,y1,x2,y2)=(min(pc2d[0]),min(pc2d[1]),max(pc2d[0]),max(pc2d[1])) # getting the bounding box coordinate
            if x1<0 or y1<0 or wi<x2 or hi<y2 or int(x2-x1)<5 or int(y2-y1)<5: # I don't keep it if the bounding box is outside the image or if the object is small on the image
                continue

            o=getOcclusion(camref, behind, full_pc2d, pc3di, (oid,l,x1,y1,x2,y2))  
            obj_by_img[frame_num_from_name(pose)][1].append([oid,l,o,x1,y1,x2,y2]) #note here I am saving only the bounding box coordinate. You might want to add the full 2D point cloud

    print('Sampiyon Besiktas', c)
    c += 1
#    import pdb; pdb.set_trace()
    fw=open(outdir+'/'+scene+'.2dgt','w')# writting the results
    for (fnum,v) in obj_by_img.items():
        if len(v)==1:
            continue
        fn = v[0]
        for detection in v[1]:
#            detection[1] = detection[1].encode('utf8')
#            fw.write(' '.join([fn]+[str(x).replace(' ','_') for x in detection ])+'\n')
            print('%s %d %s %1.3f %1.6f %1.6f %1.6f %1.6f' % ('frame-'+'%0.6d'% (fnum), int(detection[0]), str(detection[1]), float(detection[2]), float(detection[3]), 
                                                           float(detection[4]), float(detection[5]), float(detection[6])),file=fw)

WaldJohannaU commented 4 years ago

In the FAQ we answer the following questions:

How to get semantic labels?
The ids above are local how to map them to global Ids is described here
How to use the camera pose to render 3D labels is answered here
And last but not least: how to get 2D bounding boxes

These should answer the questions; closing this thread.

WaldJohannaU / 3RScan

What are segments? #2