GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.52k stars 859 forks source link

Why replicate the evaluation data in the census/tf-keras example #448

Closed helgaholmestad closed 5 years ago

helgaholmestad commented 5 years ago

In this example the validation data is replicated when the dataset is created in the function input_fn, and for the validation data one batch corresponds to a complete dataset. In the model.fit function the validation_step is set to 1. This means that only the first replica of the data will every be used. Am I missing something, or is it unnecessary to replicate the validation data? And if if is unnecessary, should it be avoided to save computing power

dizcology commented 5 years ago

Hi @helgaholmestad.

The validation dataset as you said, has the whole validation data in a single batch, so setting validation_step equal to 1 means during validation the whole validation data is used once. The data isn't really replicated even if input_fn has repeat, since it merely restarts the iteration when needed.

helgaholmestad commented 5 years ago

So you are saying that for the validation data it does not matter (because of how the datatype dataset behaves) how many times you set repeat. Repeat just means that iterations are restarted when needed. And for the validation data it is never needed as number of steps is set to one?

dizcology commented 5 years ago

Right - I think only one copy of the actual data is kept in memory, and each batch of the validation is the whole validation dataset. Every time the job runs validation, it will run a single validation step (which is the whole validation dataset). I think the repeat is actually used, since there is only one validation dataset that is used in all the validation steps.

dizcology commented 5 years ago

Closing this issue for now, please feel free to re-open if there are further issues.