Saving progress for datasets, namely IterablePipelines, is currently a bit clunky. The output dataset is agnostic of progress/location in source. With respect to the source iterator being read from, all that is really being saved is an index in the dataset being read from. Currently naively running next on iterator to get back to whatever index was saved. Leaving a note here to revisit this later as it might have unforeseen consequences at scale.
Saving progress for datasets, namely IterablePipelines, is currently a bit clunky. The output dataset is agnostic of progress/location in source. With respect to the source iterator being read from, all that is really being saved is an index in the dataset being read from. Currently naively running next on iterator to get back to whatever index was saved. Leaving a note here to revisit this later as it might have unforeseen consequences at scale.