Closed Jonathan2021 closed 1 year ago
So the issue is that the new stream in the exception is still an empty generator that rethrows StopIteration right? I think we can change this with a while loop as long as new streamer is throwing StopIteration. Could you perhaps make a MergeRequest for this? (with say a new unit-test)
You can find a first draft here: https://github.com/etienne87/pytorch-stream-dataloader/blob/more_tests/pytorch_stream_dataloader/stream_dataset.py#L130. with the test there: https://github.com/etienne87/pytorch-stream-dataloader/blob/more_tests/tests/test_stream_dataloader.py#L170 ~However i still have some concurrency issues sometimes (running many times the test), some stream do not finish. I am not sure why.~
Hi Etienne, First of all, thanks for this repo. It really made my life easier.
I use it in a way that the generator created by the streamer (in
StreamDataset
) may be empty (StopIteration
from the start). This exception is captured in theStreamDataloader
(line 118) but rethrown later at line 130, as active[i] is still 1.One solution could be to say that empty generators is an undesirable behavior and it's the user's job to make sure he is passing sane data. But in my case it's not very practical. I get a data generator from querying a data server that I then pass to a cleaning function that yields only the wanted data. Sometimes there is no wanted data for a specific query but the only way I can know is by consuming the generator with next and seeing if it is empty. As I have a lot of generators and only a few fall into this category, the overhead of checking every stream is quite important.
What do you think ?