balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow
4.11k stars 1.89k forks source link

[question] how to feed with multiple tfrecords? (slim.dataset.Dataset) #104

Open ywpkwon opened 7 years ago

ywpkwon commented 7 years ago

Hello. Above all, thanks for your great implementation. I learned a lot from this repo.

Although I know that this question would be not about the SSD implementation but more general, please let me ask here. As in this repo, I created a custom TFRecord and am using it like:

dataset = slim.dataset.Dataset(
            data_sources='my_dataset_train.tfrecord',
            reader=reader, decoder=decoder, num_samples=num_samples, ...)
provider = slim.dataset_data_provider.DatasetDataProvider(dataset, ...)

Let's say that my_dataset_train.tfrecord is already 15GB and I am getting more training data. So I want to create multiple training TFRecord files such as my_dataset_train_0.tfrecord and my_dataset_train_1.tfrecord. How can I feed multiple TFRecords?

The data_sources argument can take a list (e.g., ['a.tfrecord', 'b.tfrecord'])?

ShuangjunLiu commented 6 years ago

Here is a hint: Think about tf.train.string_input_producer https://www.tensorflow.org/api_docs/python/tf/train/string_input_producer

first parameter is string_tensor: A 1-D string tensor with the strings to produce. Actually, you can stuff all you tfr file names into a list as this parameter.

What you get is a queue with combined data from all your datasets. From here, you can read in data from the queue continuously _, serialized_example = tfr_reader.read(queue) ... ( do your decode and preprocessing...)

I think this is what you want.