SensorsINI / ddd20-utils

DDD20 End-to-End Event Camera Driving Dataset
https://sites.google.com/view/davis-driving-dataset-2020/home
GNU Lesser General Public License v3.0
23 stars 11 forks source link

data integrity issue in DDD 17 dataset #10

Open youkaichao opened 2 years ago

youkaichao commented 2 years ago

Hi, thanks for your valuable efforts in providing such a large dataset. When I try to use the DDD 17 dataset, I encountered several integrity issues:

  1. I download the DDD 17 dataset via resilio sync. It seems run5/rec1487858093.hdf5 is missing. Is it a problem at the server side or the client side? If it is a client-side issue, I can re-download that file.
  2. After exporting data from those hdf5 files using export_ddd20_hdf.py, I found that several recordings have some integrity issues. In some recordings, timestamps of event data are expected to be increasing but they are actually not; In some recordings, timestamps of frame_ts data are expected to be increasing but they are actually not. Will sorting by timestamps solve the problem? Or is it caused by some deeper reason meaning that the entire recording is invalid?

Below are a list of recordings with data integrity issues:

run3/rec1487355090.hdf5 run3/rec1487356509.hdf5 run3/rec1487417411.hdf5 run3/rec1487419513.hdf5 run3/rec1487424147.hdf5 run3/rec1487427200.hdf5 run3/rec1487430438.hdf5 run3/rec1487433587.hdf5 run3/rec1487594667.hdf5 run3/rec1487600962.hdf5 run5/rec1487849663.hdf5 run5/rec1487860613.hdf5 run5/rec1487864316.hdf5

tobidelbruck commented 2 years ago

Thanks for pointing out some data integrity problems.

First of all,  have you tried DDD20 (the Resilio Sync folder called DDD17-fordfocus)? It was collected with more care for data integrity and much more rigorous checking of the files. Please check the DDD20 webpage https://sites.google.com/view/davis-driving-dataset-2020/home .

The DDD20 ford focus recordings are in folders organized by days (e.g. aug01)

Tobi

youkaichao commented 2 years ago

The DDD 20 is just too large for me to store, so I just downloaded the DDD 17 dataset.

tobidelbruck commented 2 years ago

Oh, I see. You do not want to pay for the personal copy of Resilio to enable selective sync? I agree that $60 is a lot for one time use....

Unfortunately we cannot host this nearly 1TB via other channels right now. What we can do is put a few samples from the entire dataset on gdrive. Which recordings would be best?

Tobi

youkaichao commented 2 years ago

Well, I'm not asking for a few samples from DDD 20. Because of practical issues, I would like to stick with DDD 17. And then I found some integrity issues in DDD 17. I want to know if these integrity issues can be resolved. e.g. will sorting by timestamps solve the problem? Or is it caused by some deeper reason meaning that the entire recording is invalid? If the latter case is true, maybe you can mention it on the DDD 17 homepage or just remove those invalid recordings to avoid unnecessary download of invalid files.

tobidelbruck commented 2 years ago

We will take a look at the DDD17 files again.... it might take some time because the python environment is not setup now on any of our computers.

Regarding rec1487858093.hdf5, I cannot see it either in my sync of the folder. I will see if it is on a backup or the root server at the lab.

Regarding the other recordings: In general there is no guarantee that timestamps of records increase monotonically in time.  It is a pain but a fact...  The packets were written by the code to the hdf5 file during acquisition, but apparently can appear slightly out of order. At least that is my experience with rosbag recordings. Does the data look OK in viewer? I'm not sure what we did to sort them in original dataset paper (ICLR paper).  You can try to sort them as I do in the jAER RosbagFileInputStream (and it should be a lot easier in python).

Sorry about the hassles.

Tobi **

youkaichao commented 2 years ago

Thanks. The DDD 17 dataset is stored in a headless server and so I cannot view it. By original dataset paper, do you mean the ICML workshop paper "DDD17: End-To-End DAVIS Driving Dataset"? I don't see a ICLR paper.

tobidelbruck commented 2 years ago

I'm attempting a restore from remote S3 backup to see if that contains the missing run5 file... cross fingers for this 35GB file. Unfortunately, I recycled the HDD with the source copies. It's possible we might have a copy somewhere else too, but it could be lost.

tobidelbruck commented 2 years ago

Very unfortunately, this file seems to be lost forever unless someone else out there still has a copy of it... I don't have any on any of my computers or HDDs

On 14.01.22 07:47, youkaichao wrote:

Hi, thanks for your valuable efforts in providing such a large dataset. When I try to use the DDD 17 dataset, I encountered several integrity issues:

  1. I download the DDD 17 dataset via resilio sync. It seems run5/rec1487858093.hdf5 is missing. Is it a problem at the server side or the client side? If it is a client-side issue, I can re-download that file.

[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/SensorsINI/ddd20-utils/issues/10", "url": "https://github.com/SensorsINI/ddd20-utils/issues/10", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]