fastai / fastai1

v1 of the fastai library. v2 is the current version. v1 is still supported for bug fixes, but will not receive new features.
http://fastai1.fast.ai
Apache License 2.0
100 stars 114 forks source link

How to create static databunch from the images? #29

Open Monk5088 opened 1 year ago

Monk5088 commented 1 year ago

Hey authors, I have a problem with the data bunch, I try creating the batches of images from my dataset, but the issue is the batches keep on changing, I want fixed batches of images, every time i call data.show_batch() shows different batches, even though there are only 2 images in the batch and the batches are created from 2 images only so repeated random batches are being shown in data bunch.

r-deo commented 1 year ago

Because Databunch classes uses static shuffle parameter for dataloaders. Unlike batch size it may not be changed. You can use fix_dl for the purpose. It is shuffle=False version of train dl. But you may need to write custom logic for visualization

Monk5088 commented 1 year ago

How can we use fix_dl for training and evaluation of the model? Whenever I try to compute the precision and recall for my model, the number of ground truths in the databunch keeps changing, so I'm unable to get consistent results from multiple cells in the same notebook session.

r-deo commented 1 year ago

I did not push the fix for shuffle control. Fix dl may not be used for training. If you really want to stop the shuffling then change the following instance in basic_data. Change first True to False

zip(datasets, (bs,val_bs,val_bs,val_bs), (True,False,False,False))
Monk5088 commented 1 year ago

I have tried doing this before, it gives an error:

RecursionError                            Traceback (most recent call last)
[<ipython-input-16-f807d56c809b>](https://localhost:8080/#) in <module>
     22 lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
     23 lls = lls.transform(tfms, tfm_y=True, size=patch_size)
---> 24 data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()

4 frames
[/usr/local/lib/python3.9/dist-packages/fastai/data_block.py](https://localhost:8080/#) in databunch(self, path, bs, val_bs, num_workers, dl_tfms, device, collate_fn, no_check, **kwargs)
    551         "Create an `DataBunch` from self, `path` will override `self.path`, `kwargs` are passed to `DataBunch.create`."
    552         path = Path(ifnone(path, self.path))
--> 553         data = self.x._bunch.create(self.train, self.valid, test_ds=self.test, path=path, bs=bs, val_bs=val_bs,
    554                                     num_workers=num_workers, dl_tfms=dl_tfms, device=device, collate_fn=collate_fn, no_check=no_check, **kwargs)
    555         if getattr(self, 'normalize', False):#In case a normalization was serialized

[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in create(cls, train_ds, valid_ds, test_ds, path, bs, val_bs, num_workers, dl_tfms, device, collate_fn, no_check, **dl_kwargs)
    116         datasets = cls._init_ds(train_ds, valid_ds, test_ds)
    117         val_bs = ifnone(val_bs, bs)
--> 118         dls = [DataLoader(d, b, shuffle=s, drop_last=s, num_workers=num_workers, **dl_kwargs) for d,b,s in
    119                zip(datasets, (bs,val_bs,val_bs,val_bs), (False,False,False,False)) if d is not None]
    120         return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)

[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in <listcomp>(.0)
    116         datasets = cls._init_ds(train_ds, valid_ds, test_ds)
    117         val_bs = ifnone(val_bs, bs)
--> 118         dls = [DataLoader(d, b, shuffle=s, drop_last=s, num_workers=num_workers, **dl_kwargs) for d,b,s in
    119                zip(datasets, (bs,val_bs,val_bs,val_bs), (False,False,False,False)) if d is not None]
    120         return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)

[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in intercept_args(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn)
     14                         'num_workers':num_workers, 'collate_fn':collate_fn, 'pin_memory':pin_memory,
     15                         'drop_last': drop_last, 'timeout':timeout, 'worker_init_fn':worker_init_fn}
---> 16     old_dl_init(self, dataset, **self.init_kwargs)
     17 
     18 torch.utils.data.DataLoader.__init__ = intercept_args

... last 1 frames repeated, from the frame below ...

[/usr/local/lib/python3.9/dist-packages/fastai/basic_data.py](https://localhost:8080/#) in intercept_args(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn)
     14                         'num_workers':num_workers, 'collate_fn':collate_fn, 'pin_memory':pin_memory,
     15                         'drop_last': drop_last, 'timeout':timeout, 'worker_init_fn':worker_init_fn}
---> 16     old_dl_init(self, dataset, **self.init_kwargs)
     17 
     18 torch.utils.data.DataLoader.__init__ = intercept_args

RecursionError: maximum recursion depth exceeded while calling a Python object

image

Monk5088 commented 1 year ago

Hey @jph00 , do you know how we can achieve this, there is no way it seems to get a static databunch or imagebunch either, everytime data is called it creates dynamic databunch of different images everytime.

Monk5088 commented 1 year ago

I did not push the fix for shuffle control. Fix dl may not be used for training. If you really want to stop the shuffling then change the following instance in basic_data. Change first True to False

zip(datasets, (bs,val_bs,val_bs,val_bs), (True,False,False,False))

have you tried this in your own code , does it work for you?

r-deo commented 1 year ago

did not push the fix for shuffle control. Fix dl may not be used for training. If you really

Yes. It was working for me.

Monk5088 commented 1 year ago

Can you share the code, when I tried doing the same it gave me the maximum recursion depth exceeded. Can you look into the code that i shared above and tell where did I do wrong? Also setting it as False, False, False, False , made it as static databunch or not? Like were the images repeating when you called data.show_batch() multiple times?

r-deo commented 1 year ago

image image Above two images are tested with first value as False. I have tested it on CIFAR example.

if you just concerned with show_batch function then you can try calling it as

data.show_batch(rows=4, figsize=(4,4), ds_type=None)

It will load the data from fix_dl. You may not train using fix_dl

Somebody from core dev can shed more insight to recursion issue.

Monk5088 commented 1 year ago

Hey can you share this notebook where you loaded these images and did data.show_batch(), it will be really helpful if you shared it, so that i can see what is happening wrong with my notebook. Thanks for your help.

Monk5088 commented 1 year ago

Also , I have tried using the data.show_batch() using ds_Type=None , but it still shows different images even though the data is created from 2 images image image image