SensorsINI / DHP19

Repository for the Dynamic Vision Sensor 3D Human Pose Dataset (DHP19).
MIT License
42 stars 10 forks source link

Read .aedat file #1

Open ShallowWill opened 5 years ago

ShallowWill commented 5 years ago

Hi, thanks for such a great dataset that definitely can progress the community research.

Can you please provide some description about parameters and how to set the parameters in following function?

function [startIndex, stopIndex, ... pol_tmp3, X_tmp3, y_tmp3, cam_tmp3, timeStamp_tmp3] = ... extract_from_aedat(... aedat, events, ... startTime, stopTime, sx, sy, nbcam, ... thrEventHotPixel, dt, ... xmin_mask1, xmax_mask1, ymin_mask1, ymax_mask1, ... xmin_mask2, xmax_mask2, ymin_mask2, ymax_mask2)

what is meaning of input parameters (aedat, events, sy, sx ...)? And how to set these parameters? And also the meaning of output (cam_tmp3...). It would be better that you can provide a example file about how to read a '*.aedat' file.

I turned to use the files in 'read_aedat' folder. I used following code to read file: output = ImportAedat('', 'mov1.aedat'); Is that correct? I did not set the source parameter since I noticed that is optional. But the default source is ''Dvs128', Did you use dvs128 for recording? In your paper, I just noticed that you used DAVIS camera, I did not locate the exact camera? davis240c? davis208? or else? So I am confused about the resolution of x, y. As to the output, is 'output.data.polarity.cam' camera ID?

I loaded a '*.mat' Vicon recording. Does 1st, 2nd, 3rd column corresponds to x, y, z? Why there are negative values? Where did you set as position (0,0,0)? How to match the N timesteps to the timestamps from the events?

victkid commented 5 years ago

The file you are looking for is 'Generate_DHP19.m' under 'generate_DHP19' folder

enrico-c commented 5 years ago

Hi and thanks for your interest

Can you please provide some description about parameters and how to set the parameters in following function?

function [startIndex, stopIndex, ... pol_tmp3, X_tmp3, y_tmp3, cam_tmp3, timeStamp_tmp3] = ... extract_from_aedat(... aedat, events, ... startTime, stopTime, sx, sy, nbcam, ... thrEventHotPixel, dt, ... xmin_mask1, xmax_mask1, ymin_mask1, ymax_mask1, ... xmin_mask2, xmax_mask2, ymin_mask2, ymax_mask2)

what is meaning of input parameters (aedat, events, sy, sx ...)? And how to set these parameters? And also the meaning of output (cam_tmp3...). It would be better that you can provide a example file about how to read a '*.aedat' file.

Please have a look at the main script Generate_DHP19.m for details about the parameters

I turned to use the files in 'read_aedat' folder. I used following code to read file: output = ImportAedat('', 'mov1.aedat'); Is that correct? I did not set the source parameter since I noticed that is optional. But the default source is ''Dvs128', Did you use dvs128 for recording? In your paper, I just noticed that you used DAVIS camera, I did not locate the exact camera? davis240c? davis208? or else? So I am confused about the resolution of x, y.

The source parameter is set from the header of the imported aedat. We used DAVIS346b for our recordings, with 346x260 pixel resolution.

As to the output, is 'output.data.polarity.cam' camera ID?

Yes this is correct

I loaded a '*.mat' Vicon recording. Does 1st, 2nd, 3rd column corresponds to x, y, z?

Correct, the column order is x, y, z.

Why there are negative values? Where did you set as position (0,0,0)?

Negative values are due to the relative positions of the Vicon origin and subject. The Vicon origin is centered on the treadmill

How to match the N timesteps to the timestamps from the events?

We get the Vicon positions that are closest in time to those of the start and stop events of the current accumulated frame. You find the related piece of code in ExtractEventsToFramesAndMeanLabels.m script, variable k.

ShallowWill commented 5 years ago

Thanks for your kind reply. I still have some questions about the use of this dataset.

(1) In the extract_from_aedat.m file, why do you need to do the following conversion: X = (sx-x)+cam*sx; y = sy-y_raw; Is that because the Matlab and camera have the different (0,0) position (in matlab, it is located in the top-left) or any reasons?
As to computing (u,v) by using e.q(1) in your paper, do I need to convert (u,v) as above code does?Does (u,v) already correspond to the pixel addresses of constructed frames. Can you please check if the data in camera_positions.npy file is correct? if so, when do I need to use these data from camera_positions.npy file?

(2) As to matching the Vicon_data to events, is my following understandings correct? If I construct one frame by using first 50ms-length events, for this frame, the x,y,z position of head is the average of first 5 rows data in XYZPOS.head ( since I noticed Vicon sampling rate is 100Hz ). Then for next frame constructed from 50-100ms events, the x,y,z position of head is the average of 6-10 rows data in XYZPOS.head.

Sorry about above questions. Hopefully I described my questions clearly.

enrico-c commented 5 years ago

No problem for the questions, I have uploaded a notebook with an example showing how to create the heatmaps from the 3D positions, it should clarify some issues.

To answer the questions: (1) The rearranging is done to keep the data format for python code development, i.e., if you load and plot a file in python it is oriented the same way as the .aedat recording you play in jAER.

The camera positions are needed after you make 2D heatmap predictions, and want to triangulate from 2 or more camera views into 3D space.

(2) Yes this is correct. We take the start and stop time of the current accumulated frame, and average in time each joint to obtain the label corresponding to the accumulated frame.

ShallowWill commented 5 years ago

@enrico-c @tobidelbruck Hi, Both,

I still have questions about aligning camera recording with Vicon data.

I used previously discussed method to match Vicon data to events, and I had no luck to match them correctly by visualizing them. The precondition for using previously discussed method is that all DVS cameras and Vicon start to recording at the same time. Can you confirm this precondition?

After loading the events and XYZPOS data, I noticed that almost all events have longer time length, about 2.5 seconds longer than Vicon data. Is this because you start DVS camera and Vicon at the same time and you firstly stop the Vicon then DVS cameras after finish recording? So we can ignore the events that exceeds the maximum time of Vicon data ( I noticed you did this in your ExtractEventsToFramesAndMeanLabels.m file)?

Is it possible there is a time shift between events and Vicon data? But you already have done experiments and shown promising results in your paper using the previously discussed method to match Vicon data to events. Can you confirm whether your training images and labeled heatmaps can match correctly by checking and visualizing more samples?

Sorry about the questions and thanks.

tobidelbruck commented 5 years ago

I believe the timing is synchronized by 2 special events recorded in the DVS stream that comes from the Vicon; one at start and one at end of recording. The recording definitely don't start and end at exactly the same time, but by using these special events you can synchronize the recordings. I don't know the details of these special events; they are encoded by particular bit patterns that should be documented (perhaps they are not yet, except in code)