Closed SamuelMarks closed 4 years ago
I've just pushed what I think is a working implementation - unfortunately I can't test it because there's not enough space on the server (only have 7gb or something and anaconda is doing a good job of hogging space), but the process definitely starts.
The current implementation leverages the existing ml_glaucoma.utils.get_data
method to separate files. This uses tf.contrib
so requires tf < 2.0
. It can be run using the --bmes_init
flag in bin/__main__.py
. This must be run prior to the standard tfds.DatasetBuilder.download_and_prepare
which is run automatically if necessary. Once the tfds
files have been generated, the original get_data
directories are no longer required.
If the test/train/validation split here is just a random split, this could be done more easily by creating a single tfds
split and using tfds.Split.subsplit
- see this post.
There's a lot going on in get_data
, and I'm not sure how much of the meta-data you want in the final records. If you can provide me with e.g. a simple generator that returns an arbitrarily structured example (e.g. the simplest would be (image_path, label)) (and dtypes/shapes associated if it's not obvious) I can fold the two initialization steps together and remove redundant data duplication (and remove tf.contrib
dependency if your generator doesn't use it).
Develop
tf.data
pipeline for BMES, including saving processed file to a useful format (e.g.: tfrecords).