google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Other
2.24k stars 263 forks source link

Inconsistency between records and records_shuffled #34

Open haiyangdeperci opened 3 years ago

haiyangdeperci commented 3 years ago

It seems that some tf.records are present in the _recordsshuffled directory but not in records. I believe this is an unintended discrepancy. Essentially, out of 10532 tf.record files in _recordsshuffled only 10448 remain in records. You can investigate the 84 missing records with the following excerpt:

import tensorflow as tf

def fetchFileNames(dir_names):
    filepaths = []
    for name in dir_names:
        filepaths += tf.io.gfile.glob(f"{name}/*")
    return filepaths

record_dirs = tf.io.gfile.glob("gs://objectron/v1/records/*")
record_filepaths = fetchFileNames(record_dirs)
shuffled_dirs = tf.io.gfile.glob("gs://objectron/v1/records_shuffled/*")
shuffled_filepaths = fetchFileNames(shuffled_dirs)

assert len(record_filepaths) < len(shuffled_filepaths)

shuffled_filepaths = [fp.replace("_shuffled", "") for fp in shuffled_filepaths]
record_filepaths = set(record_filepaths)
shuffled_filepaths = set(shuffled_filepaths)
missing = shuffled_filepaths - record_filepaths

These are the missing filepaths:

{'gs://objectron/v1/records/camera/camera_test-00137-of-00163', 'gs://objectron/v1/records/laptop/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_train-
00169-of-00819', 'gs://objectron/v1/records/chair/chair_train-00953-of-01106', 'gs://objectron/v1/records/chair/chair_train-00526-of-01106', 'gs://objectron/v1/records
/cereal_box/cereal_box_train-00192-of-00819', 'gs://objectron/v1/records/camera/camera_train-00434-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00
080-of-00819', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00174-of-00819', 'gs://objectron/v1/records/camera/camera_test-00101-of-00163', 'gs://objectron/v
1/records/cereal_box/cereal_box_train-00152-of-00819', 'gs://objectron/v1/records/chair/chair_train-00833-of-01106', 'gs://objectron/v1/records/cereal_box/cereal_box_t
rain-00284-of-00819', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00228-of-00322', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00063-of-00322', 'gs
://objectron/v1/records/bottle/bottle_train-00215-of-00920', 'gs://objectron/v1/records/shoe/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00213-
of-00322', 'gs://objectron/v1/records/camera/camera_train-00539-of-00552', 'gs://objectron/v1/records/bottle/bottle_train-00273-of-00920', 'gs://objectron/v1/records/c
amera/camera_train-00140-of-00552', 'gs://objectron/v1/records/camera/camera_train-00463-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00036-of-008
19', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00025-of-00819', 'gs://objectron/v1/records/cup/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_
box_test-00247-of-00322', 'gs://objectron/v1/records/camera/camera_train-00013-of-00552', 'gs://objectron/v1/records/camera/camera_train-00252-of-00552', 'gs://objectr
on/v1/records/camera/camera_train-00408-of-00552', 'gs://objectron/v1/records/camera/camera_train-00440-of-00552', 'gs://objectron/v1/records/camera/camera_train-00148
-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00010-of-00819', 'gs://objectron/v1/records/camera/camera_test-00152-of-00163', 'gs://objectron/v1/r
ecords/chair/chair_train-00947-of-01106', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00249-of-00819', 'gs://objectron/v1/records/chair/chair_train-00207-of
-01106', 'gs://objectron/v1/records/chair/chair_train-00647-of-01106', 'gs://objectron/v1/records/camera/summary.txt', 'gs://objectron/v1/records/camera/camera_train-0
0040-of-00552', 'gs://objectron/v1/records/chair/chair_train-01068-of-01106', 'gs://objectron/v1/records/chair/chair_train-01087-of-01106', 'gs://objectron/v1/records/
chair/chair_train-01048-of-01106', 'gs://objectron/v1/records/camera/camera_test-00018-of-00163', 'gs://objectron/v1/records/chair/summary.txt', 'gs://objectron/v1/rec
ords/cereal_box/cereal_box_train-00095-of-00819', 'gs://objectron/v1/records/chair/chair_train-00361-of-01106', 'gs://objectron/v1/records/camera/camera_train-00474-of
-00552', 'gs://objectron/v1/records/camera/camera_train-00452-of-00552', 'gs://objectron/v1/records/camera/camera_train-00282-of-00552', 'gs://objectron/v1/records/cer
eal_box/cereal_box_train-00237-of-00819', 'gs://objectron/v1/records/chair/chair_train-01097-of-01106', 'gs://objectron/v1/records/bottle/bottle_train-00746-of-00920',
 'gs://objectron/v1/records/camera/camera_train-00256-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00103-of-00322', 'gs://objectron/v1/records/chai
r/chair_train-00444-of-01106', 'gs://objectron/v1/records/chair/chair_train-00904-of-01106', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00160-of-00322', 'gs
://objectron/v1/records/chair/chair_train-01090-of-01106', 'gs://objectron/v1/records/camera/camera_train-00073-of-00552', 'gs://objectron/v1/records/cereal_box/cereal
_box_train-00050-of-00819', 'gs://objectron/v1/records/camera/camera_train-00111-of-00552', 'gs://objectron/v1/records/cereal_box/cereal_box_test-00049-of-00322', 'gs:
//objectron/v1/records/chair/chair_train-00480-of-01106', 'gs://objectron/v1/records/chair/chair_train-01023-of-01106', 'gs://objectron/v1/records/cereal_box/summary.t
xt', 'gs://objectron/v1/records/camera/camera_train-00509-of-00552', 'gs://objectron/v1/records/book/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_tra
in-00115-of-00819', 'gs://objectron/v1/records/bottle/summary.txt', 'gs://objectron/v1/records/cereal_box/cereal_box_train-00068-of-00819', 'gs://objectron/v1/records/
cereal_box/cereal_box_test-00134-of-00322', 'gs://objectron/v1/records/chair/chair_train-00873-of-01106', 'gs://objectron/v1/records/chair/chair_train-00197-of-01106',
 'gs://objectron/v1/records/chair/chair_train-00350-of-01106', 'gs://objectron/v1/records/camera/camera_test-00038-of-00163', 'gs://objectron/v1/records/cereal_box/cer
eal_box_test-00073-of-00322', 'gs://objectron/v1/records/camera/camera_train-00304-of-00552', 'gs://objectron/v1/records/bike/summary.txt', 'gs://objectron/v1/records/
camera/camera_test-00046-of-00163', 'gs://objectron/v1/records/camera/camera_train-00396-of-00552', 'gs://objectron/v1/records/camera/camera_train-00062-of-00552', 'gs
://objectron/v1/records/cereal_box/cereal_box_train-00255-of-00819', 'gs://objectron/v1/records/camera/camera_test-00048-of-00163'}

Ideally, the assertion statement in the gist above would fail and the number of records in these two directories in the bucket would be equal.