Closed bayraktare closed 4 years ago
Hi @bayraktare,
we used an over-segmentation (Efficient Graph-Based Image Segmentation) of the scans when annotating our 3D models; an instance consists of multiple segments. This over-segmentation is also used in ScanNet (see here). If you want to read it you will also need mesh.refined.0.010000.segs.json
(which corresponds to <scanId>_vh_clean_2.0.010000.segs.json
in ScanNet). If you simply want to read the instance segmentation I recommend only reading the label
and objectId
of the segGroups
in *semseg.json
it maps to the objectId
in labels.instances.annotated.ply
.
Thanks for replying @WaldJohannaU
Just a little and quick question on your answer:
May you also write the corresponding files to;
1) <scanId>__vh_clean.aggregation.json
(I think this is *semseg.json
, isn't it?)
2) to obtain labels.instances.annotated.xyz
from labels.instances.annotated.ply
what should I do?
In summary I am trying to obtain ground-truths from your dataset.
The workflow of my code as follows:
1) for the scene read the mesh.refined.0.010000.segs.json, labels.instances.annotated.xyz and *semseg.json
2) from *semseg.json get objectId and labels according to the segments and append them
3) read poses per frame and take the inverse. find the boolean array which is true for the points which are behind the camera and normalize the homogeneous point: [x y 1]
4) get the points related to the objects, from the indices and obtain bounding boxes
5) eliminate the bounding-boxes if it is outside the image or if the object is small on the image: (if x1<0 or y1<0 or wi<x2 or hi<y2 or int(x2-x1)<5 or int(y2-y1)<5: continue)
Until 5th step, I get many outputs but if I apply the 5 then most of them are removed. So it does not generate any outputs for most of the scenes. Even if it generates the results, it has only several lines for the whole sequence. When I check the values before 5th step, I see negative values, or very big values for bounding-boxes. Where is the error here, can you see it? Or do you have any better idea to etrieve the ground-truths for object ids, labels and bounding-boxes?
Thanks for your time and this great work.
Yes, the corresponding file to semseg.json
is <scanId>__vh_clean.aggregation.json
in ScanNet.
_vh_clean_2.labels.ply
and labels.instances.annotated.ply
store slightly different data, to get the semantic labels I recommend you first read labels.instances.annotated.ply
- you could easily do this f.e. in python:
file = open('labels.instances.annotated.ply', 'rb')
plydata = PlyData.read(file)
labels = plydata['vertex']['objectId']
objectId
gives you an instance ID per vertex (usually a low number f.e. 34 or 42); the ID is scene specific (so 1 could be a chair in one scene but a table in another).
The ID corresponds to objectId
in semseg.json
; there you also have the class label mapping; that means you can map objectId
42 to class label box in this particular scene:
"segGroups": [
{
"id": 42,
"objectId": 42,
"label": "box",
...
{
"id": 34,
"objectId": 34,
"label": "chair",
with open('semseg.json', 'r') as read_file:
data = json.load(read_file)
for segGroups in data['segGroups']:
print(segGroups["objectId"], segGroups["label"])
Since we have 534 unique class labels, we released a class mapping to NYU40 / Eigen (chair is class 5, same as armchair and dining chair): https://github.com/WaldJohannaU/3RScan/blob/master/data/mapping.txt
I'm not sure what you are trying to do exactly but if you want to get 2D bounding boxes you could render the objectID using OpenGL (which would replace the second half of your 3. step) and do the above mapping in 2D.
Please note, you don't need to read the mesh.refined.0.010000.segs.json
.
I hope that helps.
Hi @WaldJohannaU Thank you very much for your detailed explanation. Yes, I am trying to obtain ground-truths for performance evaluation of my algorithm. I have got it for ScanNet but not for your dataset yet, unfortunately. For example a line of the file I try to get should look like this: /path/... objectID ObjectClass Occlusion x1 y1 x2 y2
I am giving my whole code to here and if you may find some time and answer me what are the mistakes, I will appreciate. Then, maybe we can put it in your repo showing how to generate ground-truths for others would be good as well.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 21 18:11:45 2020
@author: bayraktare
"""
import json, glob, csv, sys,os, argparse, meshio
import numpy as np
def get_intrinsic_color(fn):
k = open(fn, 'r')
kk = k.readlines()
# print(kk)
K = []
for i in range(len(kk)):
K.append(kk[i].split(' '))
mt = np.asarray(K[7][2:-1], dtype='float')
mat = np.reshape(mt,(4,4))
wi = int(K[2][2][:-1]) # colorwidth
hi = int(K[3][2][:-1]) # colorheight
return (mat,wi,hi)
def get_pose(fn):
return np.loadtxt(open(fn, "rb"), delimiter=" ")
def get_full_pc(fn):
#return np.loadtxt(open(fn, "rb"), delimiter=" ")
return np.genfromtxt(open(fn, "rb"), delimiter=" ")
def frame_num_from_name(filename):# when name is .../frame-000000.pose.txt
return int(filename.split('/')[-1].split('-')[1].split('.')[0])
def getOcclusion(camref, behind, full_pc2d, pc3di, bbx):
(oid,l,x1,y1,x2,y2)=bbx
inter=5 # resolution of the grid
d=0.1# distance beyond we consider that a point does not belong anymore to an object
xs=range(int(x1),int(x2),int((x2-x1)/inter))
ys=range(int(y1),int(y2),int((y2-y1)/inter))
occlus_count=0
obj_idx=np.zeros((1, len(behind)), dtype=bool)
obj_idx[[0],[pc3di]]=True
oclus=0
outsideobject=0
for i in range(len(xs)-1):
for j in range(len(ys)-1):
all_idx=np.logical_and( xs[i]<full_pc2d[0] , full_pc2d[0]<xs[i+1])
all_idx=np.logical_and(all_idx, ys[j]<full_pc2d[1])
all_idx=np.logical_and( all_idx, full_pc2d[1]<ys[j+1])
all_idx=np.logical_and(all_idx, np.logical_not(behind) ) # Collecting point cloud inside this rectangle and not begind the camera
o_idx=np.logical_and( all_idx, obj_idx) # selecting the 2D points which are inside the rectangle and belong to the object
all_idx=np.logical_and(all_idx,np.logical_not(obj_idx))# removing the points linked to the object
if not o_idx.any() or not all_idx.any():
if not o_idx.any():
outsideobject+=1 # this part of the bounding box does not contain object points, so we don't count it
continue # no object points for this rectangle, or no point who does not belong to the object
o_depth=np.min(np.linalg.norm(camref[0:3,o_idx[0,:]],axis=0))/3 #mean depth of the object for this rectangle
if sum((o_depth-np.linalg.norm(camref[0:3,all_idx[0,:]],axis=0)/3)>d)>0:# if any point not from the object is in front (i.e. has a lower depth) of the object and make an occlusion ...
oclus+=1 # ... we count this part as occluded
if ((len(xs)-1)*(len(ys)-1)-outsideobject)==0:
print('something is wrong: no points projected on the bbx')
return 1
return float(oclus)/((len(xs)-1)*(len(ys)-1)-outsideobject)
# main code
input_sequences = glob.glob('/home/.../3rscan/sequence/*')
scene_list = [i.split('/')[-1] for i in input_sequences]
datadir = '/home/.../3rscan/sequence'
outdir = '/home/.../3rscan/2dgtwithbboxes'
# CONVERT PLY TO XYZ
for ply in input_ply:
if os.path.isfile(ply.split('ply')[0]+'xyz'):
print('file exists, skipping', ply.split('ply')[0]+'xyz')
continue
d = meshio.read(ply)
np.savetxt(ply.split('ply')[0]+'xyz', d.points, fmt='%1.6f')
c = 0
for scene in scene_list:
if os.path.isfile(outdir+'/'+scene+'.2dgt'): #seq.split('/')[-1]+'.2dgt'):
print('file exists, skipping', scene) #seq.split('/')[-1])
continue
# read *.semseg.json
ag_f = datadir+'/'+scene+'/semseg.json' # seq+'/semseg.json'
if not os.path.isfile(ag_f):
print('no *semseg.json file found for the scene', ag_f)
continue
fp = open(ag_f)
aggreg = json.load(fp)
fp.close()
objs = {} # will contain the objects for this scene
seg_f = datadir+'/'+scene+'/mesh.refined.0.010000.segs.json' # seq+'/mesh.refined.0.010000.segs.json'
if not os.path.isfile(seg_f):
print('no mesh.refined.0.010000.segs.json file found for the scene', seg_f)
continue
f = open(seg_f)
segs = json.load(f)
f.close
# read *.xyz and append them into 3D point cloud
xyz_f = datadir+'/'+scene+'/labels.instances.annotated.xyz' # seq+'/labels.instances.annotated.xyz'
if not os.path.isfile(xyz_f):
print('no XYZ file found for the scene', xyz_f)
continue
pc3d=get_full_pc(xyz_f) #getting the full 3D point cloud
for po in aggreg['segGroups']: # for each object in the json file
objs[po['objectId']]=[ po['label']]# {0: 'window'}
pc3di=[]
for segid in po['segments']:#for each segment of the object o
pc3di+=[x for (x,y) in enumerate(segs['segIndices']) if y==segid ] # collecting the 3D points associated to the segment segid
objs[po['objectId']].append(pc3di)
print('Loading 3D point cloud done, number of objects:', len(objs.keys()))
(m_calibrationColorIntrinsic,wi,hi)=get_intrinsic_color(datadir+'/'+scene+'/sequence/_info.txt') # seq+'/sequence/_info.txt')#getting intrinsic parameter
print('done1')
obj_by_img={}# dictionnary that will contain the bounding boxes of the objects appearing in each image.
for pose in glob.glob(datadir+'/'+scene+'/sequence/*pose.txt'):
cam2world=get_pose(pose)# getting the inverse of the extrinsic parameters from the frame_0XXXX.pose.txt
if np.logical_not(np.isfinite(cam2world)).any() or np.isnan(cam2world).any() or cam2world.shape[0]==0:
print('erroneous camera value, skipping', cam2world)
continue #the values of the camera pose are wrong, so we skip
world2cam=np.linalg.inv(cam2world)#getting the actual parameters
obj_by_img[frame_num_from_name(pose)]=[pose.split('/')[-1].split('.')[0],[]] # will contain a list with two elements the name of the frame and the list of objects bbxs
camref=np.dot(world2cam,np.vstack((pc3d.transpose(),np.ones((1,pc3d.shape[0])))))
behind=camref[2]<=0 # boolean array which is true for the points which are behind the camera
full_pc2d=np.dot(m_calibrationColorIntrinsic, camref)
full_pc2d=np.divide(full_pc2d,np.tile(full_pc2d[2],(4,1))) #normalising the homogeneous point: [x y 1]
for oid, (l, pc3di) in objs.items():
if behind[pc3di].any():# if any of the object point is behind the camera, we skip
continue
rows=np.array( [len(pc3di)*[0], len(pc3di)*[1]] )
cols=np.array([pc3di,pc3di])
pc2d=full_pc2d[rows,cols] # get the points related to the objects, from the indices
(x1,y1,x2,y2)=(min(pc2d[0]),min(pc2d[1]),max(pc2d[0]),max(pc2d[1])) # getting the bounding box coordinate
if x1<0 or y1<0 or wi<x2 or hi<y2 or int(x2-x1)<5 or int(y2-y1)<5: # I don't keep it if the bounding box is outside the image or if the object is small on the image
continue
o=getOcclusion(camref, behind, full_pc2d, pc3di, (oid,l,x1,y1,x2,y2))
obj_by_img[frame_num_from_name(pose)][1].append([oid,l,o,x1,y1,x2,y2]) #note here I am saving only the bounding box coordinate. You might want to add the full 2D point cloud
print('Sampiyon Besiktas', c)
c += 1
# import pdb; pdb.set_trace()
fw=open(outdir+'/'+scene+'.2dgt','w')# writting the results
for (fnum,v) in obj_by_img.items():
if len(v)==1:
continue
fn = v[0]
for detection in v[1]:
# detection[1] = detection[1].encode('utf8')
# fw.write(' '.join([fn]+[str(x).replace(' ','_') for x in detection ])+'\n')
print('%s %d %s %1.3f %1.6f %1.6f %1.6f %1.6f' % ('frame-'+'%0.6d'% (fnum), int(detection[0]), str(detection[1]), float(detection[2]), float(detection[3]),
float(detection[4]), float(detection[5]), float(detection[6])),file=fw)
In the FAQ we answer the following questions:
The ids above are local how to map them to global Ids is described here
How to use the camera pose to render 3D labels is answered here
And last but not least: how to get 2D bounding boxes
These should answer the questions; closing this thread.
Hi @WaldJohannaU thanks for sharing this great dataset with us.
I just want know what are the
"segments": [1,4,3]
what you gave in*semseg.json
files?Thanks for your time and I am waiting to hearing from you.