Open Monk5088 opened 1 year ago
Because Databunch classes uses static shuffle parameter for dataloaders. Unlike batch size it may not be changed. You can use fix_dl
for the purpose. It is shuffle=False
version of train dl. But you may need to write custom logic for visualization
How can we use fix_dl for training and evaluation of the model? Whenever I try to compute the precision and recall for my model, the number of ground truths in the databunch keeps changing, so I'm unable to get consistent results from multiple cells in the same notebook session.
I did not push the fix for shuffle control. Fix dl may not be used for training. If you really want to stop the shuffling then change the following instance in basic_data
. Change first True
to False
zip(datasets, (bs,val_bs,val_bs,val_bs), (True,False,False,False))
I have tried doing this before, it gives an error:
RecursionError Traceback (most recent call last)
[<ipython-input-16-f807d56c809b>](https://localhost:8080/#) in <module>
22 lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
23 lls = lls.transform(tfms, tfm_y=True, size=patch_size)
---> 24 data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()
4 frames
[/usr/local/lib/python3.9/dist-packages/fastai/data_block.py](https://localhost:8080/#) in databunch(self, path, bs, val_bs, num_workers, dl_tfms, device, collate_fn, no_check, **kwargs)
551 "Create an `DataBunch` from self, `path` will override `self.path`, `kwargs` are passed to `DataBunch.create`."
552 path = Path(ifnone(path, self.path))
--> 553 data = self.x._bunch.create(self.train, self.valid, test_ds=self.test, path=path, bs=bs, val_bs=val_bs,
554 num_workers=num_workers, dl_tfms=dl_tfms, device=device, collate_fn=collate_fn, no_check=no_check, **kwargs)
555 if getattr(self, 'normalize', False):#In case a normalization was serialized
[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in create(cls, train_ds, valid_ds, test_ds, path, bs, val_bs, num_workers, dl_tfms, device, collate_fn, no_check, **dl_kwargs)
116 datasets = cls._init_ds(train_ds, valid_ds, test_ds)
117 val_bs = ifnone(val_bs, bs)
--> 118 dls = [DataLoader(d, b, shuffle=s, drop_last=s, num_workers=num_workers, **dl_kwargs) for d,b,s in
119 zip(datasets, (bs,val_bs,val_bs,val_bs), (False,False,False,False)) if d is not None]
120 return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)
[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in <listcomp>(.0)
116 datasets = cls._init_ds(train_ds, valid_ds, test_ds)
117 val_bs = ifnone(val_bs, bs)
--> 118 dls = [DataLoader(d, b, shuffle=s, drop_last=s, num_workers=num_workers, **dl_kwargs) for d,b,s in
119 zip(datasets, (bs,val_bs,val_bs,val_bs), (False,False,False,False)) if d is not None]
120 return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)
[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in intercept_args(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn)
14 'num_workers':num_workers, 'collate_fn':collate_fn, 'pin_memory':pin_memory,
15 'drop_last': drop_last, 'timeout':timeout, 'worker_init_fn':worker_init_fn}
---> 16 old_dl_init(self, dataset, **self.init_kwargs)
17
18 torch.utils.data.DataLoader.__init__ = intercept_args
... last 1 frames repeated, from the frame below ...
[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in intercept_args(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn)
14 'num_workers':num_workers, 'collate_fn':collate_fn, 'pin_memory':pin_memory,
15 'drop_last': drop_last, 'timeout':timeout, 'worker_init_fn':worker_init_fn}
---> 16 old_dl_init(self, dataset, **self.init_kwargs)
17
18 torch.utils.data.DataLoader.__init__ = intercept_args
RecursionError: maximum recursion depth exceeded while calling a Python object
Hey @jph00 , do you know how we can achieve this, there is no way it seems to get a static databunch or imagebunch either, everytime data is called it creates dynamic databunch of different images everytime.
I did not push the fix for shuffle control. Fix dl may not be used for training. If you really want to stop the shuffling then change the following instance in
basic_data
. Change firstTrue
toFalse
zip(datasets, (bs,val_bs,val_bs,val_bs), (True,False,False,False))
have you tried this in your own code , does it work for you?
did not push the fix for shuffle control. Fix dl may not be used for training. If you really
Yes. It was working for me.
Can you share the code, when I tried doing the same it gave me the maximum recursion depth exceeded. Can you look into the code that i shared above and tell where did I do wrong? Also setting it as False, False, False, False , made it as static databunch or not? Like were the images repeating when you called data.show_batch() multiple times?
Above two images are tested with first value as False
. I have tested it on CIFAR example.
if you just concerned with show_batch function then you can try calling it as
data.show_batch(rows=4, figsize=(4,4), ds_type=None)
It will load the data from fix_dl. You may not train using fix_dl
Somebody from core dev can shed more insight to recursion issue.
Hey can you share this notebook where you loaded these images and did data.show_batch(), it will be really helpful if you shared it, so that i can see what is happening wrong with my notebook. Thanks for your help.
Also , I have tried using the data.show_batch() using ds_Type=None , but it still shows different images even though the data is created from 2 images
Hey authors, I have a problem with the data bunch, I try creating the batches of images from my dataset, but the issue is the batches keep on changing, I want fixed batches of images, every time i call data.show_batch() shows different batches, even though there are only 2 images in the batch and the batches are created from 2 images only so repeated random batches are being shown in data bunch.