How to read the tfrecord?

google-research-datasets / richhf-18k

RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

99 stars 2 forks source link

How to read the tfrecord? #4

Open DidiD1 opened 4 months ago

DidiD1 commented 4 months ago

Great work! When i try to read the tfrecord data, some errors happened. It seems the tfrecord has been broken. When i use num_elements = tf.data.experimental.cardinality(record_iter).numpy() to check the nums, it shows 'Number of elements in dataset: -2' in the terminal. Could you release some scripts to help for read or update the tfrecord? Thanks for answer!!!

leebird commented 4 months ago

Hello, do the file sizes look correct (e.g., training set should be ~144M)? If not, you might need to install git large file storage first and git clone again: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage

Updated README.

leebird commented 3 months ago

We have also added a simple script to show how to retrieve the labels from the dataset at https://github.com/google-research/google-research/blob/master/richhf_18k/parse_tfrecord_file.py.