SamuelMarks / ml-glaucoma

ML programs for glaucoma diagnoses.
https://sydneyscientific.org
4 stars 1 forks source link

tf.data for BMES #2

Closed SamuelMarks closed 4 years ago

SamuelMarks commented 5 years ago

Develop tf.data pipeline for BMES, including saving processed file to a useful format (e.g.: tfrecords).

jackd commented 5 years ago

I've just pushed what I think is a working implementation - unfortunately I can't test it because there's not enough space on the server (only have 7gb or something and anaconda is doing a good job of hogging space), but the process definitely starts.

The current implementation leverages the existing ml_glaucoma.utils.get_data method to separate files. This uses tf.contrib so requires tf < 2.0. It can be run using the --bmes_init flag in bin/__main__.py. This must be run prior to the standard tfds.DatasetBuilder.download_and_prepare which is run automatically if necessary. Once the tfds files have been generated, the original get_data directories are no longer required.

If the test/train/validation split here is just a random split, this could be done more easily by creating a single tfds split and using tfds.Split.subsplit - see this post.

There's a lot going on in get_data, and I'm not sure how much of the meta-data you want in the final records. If you can provide me with e.g. a simple generator that returns an arbitrarily structured example (e.g. the simplest would be (image_path, label)) (and dtypes/shapes associated if it's not obvious) I can fold the two initialization steps together and remove redundant data duplication (and remove tf.contrib dependency if your generator doesn't use it).