Unity-Technologies / com.unity.perception

Perception toolkit for sim2real training and validation in Unity
Other
924 stars 177 forks source link

Why is my generated depth map all red #558

Closed hanshuai1232000 closed 1 year ago

hanshuai1232000 commented 1 year ago
截图20230102152052
hanshuai1232000 commented 1 year ago

Shouldn't it be like this? 微信图片_20230102152340

mark1ng123 commented 1 year ago

Hey, First of all the output you are receving is the mask image file with the way you labeled your objects. If you want to recieve the depth the way you want there are many other unity libraries to receive it, simple google search will do the trick. If you have any question feel free to DM me, Mark

hanshuai1232000 commented 1 year ago

so how do i remove the masked image file?

mark1ng123 commented 1 year ago

You dont need to remove it, You just need to write a script that changes the way you capture your mask or depth img, You need to adjust it.

hanshuai1232000 commented 1 year ago

I am a novice, can you give an example or give a link to explain?

hanshuai1232000 commented 1 year ago

I do not really understand what you mean?

mark1ng123 commented 1 year ago

The image you are receving is the masked labeled image, It means you labeled the boxes/objects which rendered to your screen and after the screen was captured you received this data which means in most case scenarios a width by height , 2D matrix of 0's and 1's that are telling you where the object has been placed in your screen when 1's means part of the object and 0 means no object there. In this case scenario as you can see the mask is red, so it is represented otherwise in the output matrix you are receiving.(probably 2D matrix when every cell in the matrix has 3 channels (RGB)). To receive this mask, there are sevearal ways, the most common one is taking the RGB capture and taking an empty matrix when thier cordinates are alligned, and in every (x,y) pos the object is placed we just put 1 in the empty matrix or in this case (255,0,0). To recive a depth image like you want i would search this repository and will see that there is an option to add a depth channel to the perception camera object in a runtime script: https://github.com/Unity-Technologies/com.unity.perception/blob/f45895f7dcad27dee545d6165a2f6c237554600a/com.unity.perception/Runtime/GroundTruth/Labelers/Depth/DepthLabeler.cs , https://github.com/Unity-Technologies/com.unity.perception/blob/f45895f7dcad27dee545d6165a2f6c237554600a/com.unity.perception/Runtime/GroundTruth/Sensors/Channels/DepthChannel.cs , and try to work around it maybe somehow (this i got from a simple look in the github repository there might be other ways https://github.com/Unity-Technologies/com.unity.perception/search?p=1&q=depth). Or try to search for other libraries or read on how to capture depth in unity in a more simple way. I truely recommend on reading the diffrences between mask and depth matrixs and receving more knowelge in this field before diving deeper. Hope i helped, take your time and dont rush the process, Mark

hanshuai1232000 commented 1 year ago

Thank you very much, I will research

mark1ng123 commented 1 year ago

Hope you are getting along with all the new information, if you can, close the issue and reopen if you struggle again, Hope you will not need further help :)

cs1488 commented 1 year ago

Hello, it seems like i ran into a similar situation. I also want to receive the depth information, even better would be a labled 3D Point Cloud like suggested here #499

I am not sure what i am doing wrong but my depth images where all red too. So i reduced the complexety and only added the following components to my scene:

From the picture alone you can see that their distance to the camera is different, as they are all copies form each other, but appear as if they have different sizes.

step1 camera

So following the tool tip: "Capturing depth values returns the distance between the surface of the object and the forward " + "plane of the camera. Capturing range values returns the line of sight distance between the surface of " + "the object and the position of the camera." found here: https://github.com/Unity-Technologies/com.unity.perception/blob/f45895f7dcad27dee545d6165a2f6c237554600a/com.unity.perception/Runtime/GroundTruth/Labelers/Depth/DepthLabeler.cs

I would expect, that for every location in the Image there is one value giving the range. I learned that the output is stored in an Exr File as 32bit "description": "Generates a 32-bit depth image in EXR format where each pixel contains the actual distance in Unity units (usually meters) from the camera to the object in the scene." From the json file. So what i would expect is, that there would be different values for the pixels, which should result in different colors?

When i try to look at it with gimp, all the location seem to have the same value, as the are all depicted with the same red. step1 camera Depth_0

Am i using the labeler wrong or do i need to add the depth channel mentioned in your reply @mark1ng123 somehow? Or is my interpretation tool gimp not the right thing to use?

Best Regards

mark1ng123 commented 1 year ago

@cs1488 Hey there glad to see more people are using this library for there synthetic data, I havent used the depth labler yet i only used the instance and semantic seg for my work so i dont really know how it works, i can make some tests but it will take sometime. When i work with depth cameras (it dosent matter if its a simulation camera or real depth camera stereo or lidar) i use a different channel for my depth values and receive my depth matrix from the depth channel i assigned. Then to generate a pointcloud you have many options such as the PCL library or aligning the rgb matrix to the depth matrix so you can use the indexes of the x,y from the matrix and the value of z-axis from the depth matrix, etc. Since pcl arent supported yet as it said in the issue you mentioned i would use one of the methods i described and add another depth channel to the main camera and capture the depth values somehow(i think its also supported in one of the scripts the unity team as maid, also mentioned in the issue). By the way you get all red beacuse you labeled all the cubes with the same label. (i dont know how it supposed to work with depth labler in semantic seg youll get same colored masks for the same labels)

One more thing you can do is to load with some other proggraming lang for example python and numpy the matrix of the mask/depth image and check what are the values, it may seem all red but maybe the values are different.

hanshuai1232000 commented 1 year ago

I still have that red problem and can't fix it

mark1ng123 commented 1 year ago

Look at the answer @hanshuai1232000, as i said above try using depth channel and capturingthe depth as a matrix without the labler and saving a png file. I didnt really use thje depth labler to give the advice on how to use it, from my experince ive used a different channel and capturing and saving a png image file represented by the depth matrix.

cs1488 commented 1 year ago

@hanshuai1232000 for the case that you are using the depth labeler, it works as intented. I did not use the right software to look at it. Now i tried what @mark1ng123 suggested and looked at the values with opencv.

Here is a little python script you can try:

import os
os.environ["OPENCV_IO_ENABLE_OPENEXR"]="1"
import cv2
import numpy as np
img = cv2.imread("1.exr",cv2.IMREAD_UNCHANGED)
img = img[:,:,2]
min = img.min()
max = img.max()
print(min, max)
img = (255-((img-min)*224/max)).astype(np.uint8)
img[img==255] = 0
cv2.imshow("d",img)
cv2.waitKey(0)``

In this zip is the Example i used in my last comment 1.zip

It should look like this: cubes

As the Tooltip says: "Capturing depth values returns the distance between the surface of the object and the forward " + "plane of the camera. Capturing range values returns the line of sight distance between the surface of " + "the object and the position of the camera."

So to answer your question: Your generated depth map appears all red, because the depth values are stored as 32bit float in an exr file. Specially in the red channel. When you use some interpreter that just reads the numbers as they are it sees information in the red channel and displays them. Why every depth looks the same i dont know yet, but i guess the range in exr format is very big. (Objects at depth 1000 got the same color as ones at 5) In my skript however i map the distances to a range from 0-255 and they can be shown as grayscale.

hanshuai1232000 commented 1 year ago

@cs1488 ,You know how to use python to convert to grayscale images, then do you know how to use c# or unity to convert?

mark1ng123 commented 1 year ago

@cs1488 Great job figuring it out !

@hanshuai1232000 you'll need to follow the same actions he did as you can see he worked with image matrix represented with numpy library in python, he did some simple arithmatics on the image matrix, try figuring how to work with a matrix in c#, try looking for something like this: https://social.msdn.microsoft.com/Forums/vstudio/en-US/2f27fcea-6eaa-41e2-8c8f-3900ecfbc159/how-to-convert-an-image-to-an-matrix-array-in-c-it-does-not-deal-with-the-color-as-mentioned-in-the?forum=csharpgeneral, although youll need to also figure out how to work with the exr file in c# and 32 bit float matrix. Also look for some libraries in c# that can make your life easier doing the arithmetics with the matrix.

StevenBorkman commented 1 year ago

I am sorry for chiming late to this. But as some uses have commented on this, our exr files are using 32bit per channel for depth, and all of this in the red channel. Our visualization solution, pysolotools-fiftyone can properly display them, and shows examples, using python, how to interpret them.

Thanks community for the support!!!

hanshuai1232000 commented 1 year ago

@StevenBorkman,@mark1ng123,@cs1488,i have another question,I see that the output of the shader is float4(i.cameraSpacePosition.z, 0, 0, 1), which is the depth z in the camera space, Question 1: Have the values in the generated exr depth map been converted? Is it just the z value in the camera space that is stored in it? Does it have nothing to do with the near and far of the camera? Question 2: If the depth z in the actual camera space is stored in exr, what becomes of the actual depth z in the png converted by pysolotools_fiftyone? How to get the original depth if you read png with python code?

StevenBorkman commented 1 year ago

Hey @hanshuai1232000, thanks for the questions. 1) The values in the depth map are the actual distance from the camera. From the docs: The depth labeler outputs a 32-bit depth image in EXR format. Each pixel contains the actual distance in Unity units (usually meters) from the camera to the object in the scene. The depth value is written into the R channel directly. 2) To create a visualization for pysolotools-fiftyone, we divide every pixel value by the max distance in the array to get a percentage. We then call the Voxel fiftyone Heatmap call with the normalized numpy array. So, I'm guessing here, because I'm not sure what is going on in Fiftyone code, that the values are being truncated into 255 distinct ranges to properly fit in a standard png file. I would not trust any value that I would get from a converted png file. That is merely for a quick visual inspection.

StevenBorkman commented 1 year ago

Oh, link to pysolotools-fiftyone code: here

hanshuai1232000 commented 1 year ago

ths,i see

fancyhhhhhh commented 10 months ago

@cs1488 hello, I am trying to convert depth map to point cloud like you, and my target is to build a 3D segmentation dataset. I am wondering if you've done this part and how you converted depth map to point cloud? Can you give me a link or let us communicate further? Thank you very much

fancyhhhhhh commented 10 months ago

I run the code @cs1488 gived, but I got error image Why the type of img is None? Thank you veyr much!

@hanshuai1232000 for the case that you are using the depth labeler, it works as intented. I did not use the right software to look at it. Now i tried what @mark1ng123 suggested and looked at the values with opencv.

Here is a little python script you can try:

import os
os.environ["OPENCV_IO_ENABLE_OPENEXR"]="1"
import cv2
import numpy as np
img = cv2.imread("1.exr",cv2.IMREAD_UNCHANGED)
img = img[:,:,2]
min = img.min()
max = img.max()
print(min, max)
img = (255-((img-min)*224/max)).astype(np.uint8)
img[img==255] = 0
cv2.imshow("d",img)
cv2.waitKey(0)``

In this zip is the Example i used in my last comment 1.zip

It should look like this: cubes

As the Tooltip says: "Capturing depth values returns the distance between the surface of the object and the forward " + "plane of the camera. Capturing range values returns the line of sight distance between the surface of " + "the object and the position of the camera."

So to answer your question: Your generated depth map appears all red, because the depth values are stored as 32bit float in an exr file. Specially in the red channel. When you use some interpreter that just reads the numbers as they are it sees information in the red channel and displays them. Why every depth looks the same i dont know yet, but i guess the range in exr format is very big. (Objects at depth 1000 got the same color as ones at 5) In my skript however i map the distances to a range from 0-255 and they can be shown as grayscale.

cs1488 commented 10 months ago

@fancyhhhhhh you can start from this file. Save it as python and maybe make some adjustm What i did there was fixing the Camera at a certain point, then combine the information from the following files:

With a fixed Camera e.g. at (0,0,-10) and vectors from the camera to the objects it is fairly easy to calculate the world coordinates

convert2txt_v4.txt

cs1488 commented 10 months ago

Type is not none, there seems to be a problem with opening the file as the error message you provided states! Check if the skript and the img are in the same directory I can replicate the error, when i am trying to run the skript somewhere else. Type should be <class 'numpy.ndarray'>

I run the code @cs1488 gived, but I got error image Why the type of img is None? Thank you veyr much!

@hanshuai1232000 for the case that you are using the depth labeler, it works as intented. I did not use the right software to look at it. Now i tried what @mark1ng123 suggested and looked at the values with opencv. Here is a little python script you can try:

import os
os.environ["OPENCV_IO_ENABLE_OPENEXR"]="1"
import cv2
import numpy as np
img = cv2.imread("1.exr",cv2.IMREAD_UNCHANGED)
img = img[:,:,2]
min = img.min()
max = img.max()
print(min, max)
img = (255-((img-min)*224/max)).astype(np.uint8)
img[img==255] = 0
cv2.imshow("d",img)
cv2.waitKey(0)``

In this zip is the Example i used in my last comment 1.zip It should look like this: cubes As the Tooltip says: "Capturing depth values returns the distance between the surface of the object and the forward " + "plane of the camera. Capturing range values returns the line of sight distance between the surface of " + "the object and the position of the camera." So to answer your question: Your generated depth map appears all red, because the depth values are stored as 32bit float in an exr file. Specially in the red channel. When you use some interpreter that just reads the numbers as they are it sees information in the red channel and displays them. Why every depth looks the same i dont know yet, but i guess the range in exr format is very big. (Objects at depth 1000 got the same color as ones at 5) In my skript however i map the distances to a range from 0-255 and they can be shown as grayscale.

fancyhhhhhh commented 10 months ago

@cs1488 Thanks so much ,I will try

fancyhhhhhh commented 10 months ago

@cs1488 Bro, have you successfully created a labeled point cloud dataset? How do you label point cloud? I am currently doing something similiar to yours and I would really appreciate it if you could give me some advice. Thanks :)

fancyhhhhhh commented 10 months ago

@cs1488 thanks a lot for the code you provided :) but a few questions with this code.