unexpected behavior in Dataset in bm/migrate-to-torch

DeepTrackAI / DeepTrack2

DeepTrack2

MIT License

162 stars 50 forks source link

unexpected behavior in Dataset in bm/migrate-to-torch #200

Closed cmanzo closed 9 months ago

cmanzo commented 10 months ago

@BenjaminMidtvedt: I was expecting this code: train_dataset = dt.pytorch.Dataset(pipeline, inputs=source, length=1000, replace=0.2) train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) where source is a path to an image folder, to use 1000 images per epoch and replace them with replace probability. However, it seems to perform the training using all the images in the source for each epoch.

BenjaminMidtvedt commented 10 months ago

Ah, interesting use-case. I will think about if this can be implemented. Main difficulty is that I don't think dataloader notifies the dataset when a new epoch starts (unlike tensorflow).

cmanzo commented 10 months ago

I see... but still, it should only use length images, no?

BenjaminMidtvedt commented 10 months ago

My idea was that length was used to determine the number of images if not using sources. I can propose something in a bit, see if it suits your needs

cmanzo commented 10 months ago

In that case, let’s leave it as is. There are alternative ways to control lenght when using source. I mean, the cool thing would be to have the ‘replace’ together with the source 😉