Closed polmagri closed 3 months ago
Hi @polmagri It looks as though your script is converting RealSense frames to OpenCV format. Is that what is happening, please? If it is then you could be losing some quality in the conversion process.
RealSense depth frames are in a 16-bit format - uint16_t - and converting 16-bit RealSense depth frames to OpenCV images will typically result in an image that is only 8-bit unless it is specified in the script that frames should be converted to a 16-bit OpenCV format, such as CV_16UC1 - please see https://github.com/IntelRealSense/librealsense/issues/12901#issuecomment-2097888508
In regard to the silhoutte around human figures, this sounds like a phenomenon called occlusion that is common in RealSense images of humans.
Removal of occlusion - called 'occlusion invalidation' - is handled automatically in the RealSense SDK when generating 3D pointclouds. For non-pointcloud 2D images you might have to apply hole-filling post-processing to the images in real-time before recording the frames. Hole filling post-processing can be done with the Spatial filter or with the Hole-Filling filter.
More information about occlusion invalidation can be found in Intel's guide 'Projection, Texture-Mapping and Occlusion with Intel RealSense Depth Cameras' at the link below.
From the previous script, it's clear that I saved the depth images in 16-bit using the 'z16' format, and in the script where I use them to apply YOLO, I import and synchronize them using these functions:
def extract_and_process_rosbag(bag_file):
bridge = CvBridge()
rgb_images = []
depth_images = []
with rosbag.Bag(bag_file, 'r') as bag:
for topic, msg, t in bag.read_messages(topics=['/camera/color/image_raw', '/camera/depth/image_raw']):
if topic == '/camera/color/image_raw':
rgb_image = bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')
rgb_images.append(rgb_image)
elif topic == '/camera/depth/image_raw':
depth_image = bridge.imgmsg_to_cv2(msg, desired_encoding='passthrough')
depth_images.append(depth_image)
assert len(rgb_images) == len(depth_images), "The number of RGB and depth frames do not match."
return rgb_images, depth_images
And then:
try:
for color_image, depth_image in zip(rgb_images, depth_images):
# Remove any filtering step here
depth_filtered = depth_image
# Run the YOLO model on the frames
persons = model(color_image)
# Normalize depth image for display
depth_image_display = cv2.normalize(depth_filtered, None, 0, 255, cv2.NORM_MINMAX)
depth_image_display = cv2.convertScaleAbs(depth_image_display)
# Convert depth image to color for better visualization
depth_image_colormap = cv2.applyColorMap(depth_image_display, cv2.COLORMAP_JET)
Could you help me identify any errors I might be missing? Is it possible that the issue is due to the color images being saved in 8-bit while the depth images are in 16-bit, leading to some quality loss in the color images?
I do not have advice to offer about the code, as its combination of RealSense, rospy, OpenCV and YOLO code exceeds my programming knowledge, unfortunately. I do apologize.
Although the RealSense SDK's Z16 format is 16-bit, its RGB8 format is already 8-bit and so saving color as an 8-bit image should not significantly affect it.
Regarding depth though, even if depth is saved to a 16-bit image then there will likely be some loss of depth information unless it is saved in an image format that preserves depth, such as .raw. In regard to non-image formats, .bag and .npy do preserve depth information.
I understand from this case that you are saving in .bag and .npy format. Could you provide more information about the reason for also saving a video in .avi please? Is that for YOLO?
There is a Python script at https://github.com/IntelRealSense/librealsense/issues/4934#issuecomment-537705225 that saves .npy as an array of scaled matrices. If you have not seen it already then it might be a helpful reference to compare to your own code.
Hi @polmagri Do you require further assistance with this case, please? Thanks!
Case closed due to no further comments received.
Before opening a new issue, we wanted to provide you with some useful suggestions (Click "Preview" above for a better view):
All users are welcomed to report bugs, ask questions, suggest or request enhancements and generally feel free to open new issue, even if they haven't followed any of the suggestions above :)
Issue Description
I created a script that uses YOLOv8 Pose to obtain the 3D skeleton using my RealSense D435i. When I use the camera live, it works correctly.
Subsequently, I created a dataset by saving videos in both ROSBAG and .npy files, including both depth and RGB videos. However, when I try to apply my algorithm to these videos (which seem to be well synchronized), I experience a loss of information in the depth data. I can't recreate the skeleton because it seems that the depth frames are less precise and have holes, especially around the human silhouette.
The intrinsic parameters have been used correctly. Do you have any advice or can you explain why, when saving to ROSBAG with the script mentioned above, I experience this loss of quality? I just want the videos saved in ROSBAG or as individual .npy frames to work the same as when I use the RealSense directly.
Thanks in advance.