ArduCAM / Arducam_tof_camera

46 stars 18 forks source link

Construct AMPLITUDE_FRAME and DEPTH_FRAME images from RAW_FRAME #62

Closed luveti closed 8 months ago

luveti commented 9 months ago

What would the algorithm for this look like? I see several constants in ArducamTOFCamera.hpp that I'm assuming are used in this calculation? Specifically:

int _capture_range = 4;
float _f_mod = 375e5; //! Camera light wave modulation frequency
const float _c = 3e8; //! speed of light
const float _pi = M_PI; //! π
const unsigned int _image_size = 43200;
dennis-ard commented 9 months ago

The algorithm for calculating distance using the phase difference of sinusoidal signals typically involves the following steps:

  1. Acquiring the phase shift: The TOF (Time-of-Flight) camera captures sinusoidal signals and determines the phase shift between the emitted and received signals. This phase shift represents the delay experienced by the signal during its travel.

  2. Converting phase shift to distance: The measured phase shift is converted into a corresponding distance value using calibration data or predefined formulas. This conversion relies on known properties of the system, such as the wavelength of the emitted signal.

These constants play a crucial role in the calculation of the second step, where the phase shift is converted to distance. They are used to ensure accurate measurements. For specific usage and in-depth details, it would be beneficial to refer to the principles and documentation related to Time-of-Flight (TOF) technology.

luveti commented 9 months ago

Thanks for the explanation!

Are you able to provide source for your implementation of converting the phase shift to a distance? We're particularly interested in the following method:

void ArducamTOFCamera::getPhaseAndAmplitudeImage(int16_t* raw_ptr, float* phase_ptr, float* amplitude_ptr);

As it appears to accept a value matching the return type of frame->getData(FrameType::RAW_FRAME)

The system we're developing requires the lowest possible latency, and for everything to run on a single core with no multi-threading (ensured at the OS level using cgroups). We noticed the SDK performs some work on a background thread and would like to eliminate this.

We also intend on our binary being statically linked, so we're obtaining the 4 raw frames using V4L2 directly, and would like to avoid linking against any shared libraries.

luveti commented 9 months ago

I was unable to find the datasheet for the sensor you guys are using, the Sony IMX316. I'm guessing it probably requires one to sign an NDA.

I've managed to get pretty far by reading the datasheet for the epc660: https://www.espros.com/downloads/01_Chips/Datasheet_epc660.pdf

These appear to work in a similar fashion. Four raw images, aka Differential Correlation Samples (DCS) are output by default. Each pixel appears to be a 10 bit signed integer, where the 11th bit is used to indicate the sign. These are packed into two bytes when putting the device into Y12 mode. Discovering that these are signed integers was a huge breakthrough as the the image is very noisy otherwise, as the negative numbers are very large when unsigned.

Using the algorithm defined in section 9.2. Distance calculation algorithm gets me an depth image that is pretty close to what your SDK produces. There appears to be an offset issue that I'm trying to figure out. I haven't yet tried to produce the amplitude image to see if it comes out correct, but with the signed fix I'm guessing the algorithm in section 9.2.2. Quality of the measurement result should produce the correct result. I'll update this issue again when I get back to it next week.

luveti commented 8 months ago

It appears I was close, but going down the wrong route. The solution was actually simpler than I thought.

It would appear that the 4 least significant bits of each 16 bit pixel are always 0 in the SDK frames.

Comparing these with the raw V4L2 frames showed me that the bits needed shifted.

After some experimentation, I found the following will produce signed values that match the raw frames from the SDK.

let pixel0 = (((frames[2][i] as i16) << 5) >> 1) as f32;
let pixel1 = (((frames[3][i] as i16) << 5) >> 1) as f32;
let pixel2 = (((frames[0][i] as i16) << 5) >> 1) as f32;
let pixel3 = (((frames[1][i] as i16) << 5) >> 1) as f32;

Another thing that was throwing me off was how the capture_raw.cpp example showed only black and white pixels.

It appears this is the result of converting it to the CV_32F format.

Leaving it in the CV_16S format gives nice smooth images. Similar to the images on page 15 of this PDF: https://www.neumueller.com/datenblatt/epc/epc660_Evalkit.pdf

Here's the related image from that PDF: image

OpenCV appears to perform a CV_16S to CV_8U conversion before display, but doesn't do this for CV_32F images.

As I'm no longer using OpenCV, I do the following in my code:

let pixel0 = ((pixel0 + (i16::MAX as f32)) / (u16::MAX as f32)) * 255.;
let pixel1 = ((pixel1 + (i16::MAX as f32)) / (u16::MAX as f32)) * 255.;
let pixel2 = ((pixel2 + (i16::MAX as f32)) / (u16::MAX as f32)) * 255.;
let pixel3 = ((pixel3 + (i16::MAX as f32)) / (u16::MAX as f32)) * 255.;

Rendering these shows the expected results.

From there, the previously mentioned algorithms produce depth and amplitude images that look correct.

Here's the final result:

image