Pedestrian Dataset-Label --2--> EventData

tbolten commented 4 years ago

Hi,

I'm interested in using your DVS-based 'Pedestrian Detection Dataset' for my current research. But unfortunately I have some trouble to map the label annotations to the original data.

According to my understanding of the data the dataset consists out of a total of 12 parts (the available *.aedat files).

According to the published details:

All recordings last about 30s, with slight variations in length.

Opening the AEdat-files in the AEViewer from the jaer-project reports other lengths for the recordings:

File 1          approx. 71s
File 2          approx. 58s
File 3          approx. 24s
File 4          approx. 255s
File 5          approx. 39s
File 6          approx. 61s
File 7          approx. 61s
File 8          approx. 95s
File 9          approx. 26s
File 10         approx. 79s
File 11         approx. 39s
File 12         approx. 44s

A large part of pedestrian detection raw data are converted to 4670 frame images through SAE encoding method with frame interval of 20ms. In our experiment, all these images have been labelled via annotation tool -labelImg.

Even when I calculate with an average of 30s per file, I assume a total of approx. 12 * 30s = 360s = 360000ms / 20ms = 18000 frames.

How did you select the subset of 4670 images from the dataset for the annotations?

The available XML label-information refers to one image in the SAE-encoding. But unfortunately it is (at least for me, now) not possible to map this image name to one aedat-file and the corredsponding 20ms window wihtin this file.

So, it is not possible to use the annotations for another encoding or even for a complete other approach?

I tried to map the published SAE-frames (by the way: 'Pedestrian frame.rar' contains about 700 more images than 'Pedestrian label.rar' contains Label-XML-files?) back to the AEdat-files by doing:

Generating own SAE-Frames with the in the repository provided 'sae.py' with a time-window of 20ms
and comparing the md5-hash from my own generated frames to the provided frames in 'Pedestrian frame'

Unfortunately this seems not to work because of an unkown post-processing on the provided frames. I assume some morphological post-processing to remove some noise.

Example for self-converted: self_encoded

Example for provided:

shirleyatgithub commented 3 years ago

@tbolten hi, I see your issue when I encountered the same question about the selection of the 4670 images (actually 4668 images with xml labels). Have you solved this problem? Appreciate your reply.

tbolten commented 3 years ago

@shirleyatgithub No, unfortunately I could not resolve the problem.

CrystalMiaoshu / PAFBenchmark

Pedestrian Dataset-Label --2--> EventData #2