In effect, the implementation being replaced concatenates a list of tensorflow.Dataset objects via:
response = None
for dataset in dataset_list:
if response is None:
response = dataset
else:
response = response.concatenate(dataset)
return response
Each tensorflow.Dataset.concatenate call has response, a large accumulating dataset, on the left and dataset, an element from the input list, on the right. At the time of model.predict(combined_dataset), this causes a resource exhaustion and kills the process on an example from @cooperlab, apparently because finding the first dataset of the list requires descending through all the non-eager concatenations.
This pull request instead recursively splits the dataset_list in half, so that the tensorflow.Dataset.concatenate call is between two equally sized combination datasets. We went for this balanced solution rather than the right-heavy solution because the latter might lead to resource exhaustion in the case that shuffling makes a dataset from near the end of the dataset_list be processed early. See the code for additional details.
In effect, the implementation being replaced concatenates a list of
tensorflow.Dataset
objects via:Each
tensorflow.Dataset.concatenate
call hasresponse
, a large accumulating dataset, on the left anddataset
, an element from the input list, on the right. At the time ofmodel.predict(combined_dataset)
, this causes a resource exhaustion and kills the process on an example from @cooperlab, apparently because finding the first dataset of the list requires descending through all the non-eager concatenations.This pull request instead recursively splits the
dataset_list
in half, so that thetensorflow.Dataset.concatenate
call is between two equally sized combination datasets. We went for this balanced solution rather than the right-heavy solution because the latter might lead to resource exhaustion in the case that shuffling makes a dataset from near the end of thedataset_list
be processed early. See the code for additional details.