fastai / fastai

The fastai deep learning library
http://docs.fast.ai
Apache License 2.0
26.23k stars 7.56k forks source link

error on predict() or using interp with resnet and MixUp #3180

Closed frankfletcher closed 3 years ago

frankfletcher commented 3 years ago

Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES

Describe the bug Using resnet18 along with the MixUp callback to categorize images on the PETS data results in a post-training error. When using learn.predict(), pred is given as a tensor of n category labels with repeats. pred_idxis a tensor of dimension n number of categories whose contents are floating points. Actually, pred_idx is a direct copy of prob

Current Workarounds

  1. Regressing to FastAI v 2.2.2 or
  2. Remove MixUp

To Reproduce As requested by Zach Mueller, I have made a minimal reproducer to debug with available here: https://github.com/cogsci2/Pets-Minimal/blob/main/MinimalPets_debug_mixup.ipynb

  1. Use any image categorization model that uses MixUp and Resnet.
  2. After training, use show_results() and plot a confusion matrix.

image

Expected behavior No error message is given, kernel execution is not halted.

Error with full stack trace

Place between these lines with triple backticks:

Error when using plot_confusion_matrix():

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-4eca0711123a> in <module>
      1 interp = ClassificationInterpretation.from_learner(learn)
----> 2 interp.plot_confusion_matrix(figsize=(8,8))

~/miniconda3/envs/learn/lib/python3.8/site-packages/fastai/interpret.py in plot_confusion_matrix(self, normalize, title, cmap, norm_dec, plot_txt, **kwargs)
     68         "Plot the confusion matrix, with `title` and using `cmap`."
     69         # This function is mainly copied from the sklearn docs
---> 70         cm = self.confusion_matrix()
     71         if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
     72         fig = plt.figure(**kwargs)

~/miniconda3/envs/learn/lib/python3.8/site-packages/fastai/interpret.py in confusion_matrix(self)
     60         "Confusion matrix as an `np.ndarray`."
     61         x = torch.arange(0, len(self.vocab))
---> 62         d,t = flatten_check(self.decoded, self.targs)
     63         cm = ((d==x[:,None]) & (t==x[:,None,None])).long().sum(2)
     64         return to_np(cm)

~/miniconda3/envs/learn/lib/python3.8/site-packages/fastai/torch_core.py in flatten_check(inp, targ)
    795     "Check that `out` and `targ` have the same number of elements and flatten them."
    796     inp,targ = TensorBase(inp.contiguous()).view(-1),TensorBase(targ.contiguous()).view(-1)
--> 797     test_eq(len(inp), len(targ))
    798     return inp,targ

~/miniconda3/envs/learn/lib/python3.8/site-packages/fastcore/test.py in test_eq(a, b)
     33 def test_eq(a,b):
     34     "`test` that `a==b`"
---> 35     test(a,b,equals, '==')
     36 
     37 # Cell

~/miniconda3/envs/learn/lib/python3.8/site-packages/fastcore/test.py in test(a, b, cmp, cname)
     23     "`assert` that `cmp(a,b)`; display inputs and `cname or cmp.__name__` if it fails"
     24     if cname is None: cname=cmp.__name__
---> 25     assert cmp(a,b),f"{cname}:\n{a}\n{b}"
     26 
     27 # Cell

AssertionError: ==:
54686
1478

Error when using predict() (this is from a different model)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-aa42150ce6e1> in <module>
     71 
     72         # write the prediction as text on the frame
---> 73         if probs[pred_idx.long()] > 0.5:
     74             frame = cv.putText(frame, f'Predict: {pred}, Conf: {probs[pred_idx.long()]:.02f}', 
     75                                org=(20,40), fontFace=cv.FONT_HERSHEY_PLAIN,

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

Additional context

CUDA available:          True
fastai version:          2.2.5
fastcore version:        1.3.19
fastbook version:        0.0.16
cuda version:            11.0
torch version:           1.7.0
python version:          3.8.6
muellerzr commented 3 years ago

@jph00 I plan on looking into this over the weekend, just fyi

muellerzr commented 3 years ago

@jph00 here is a minimal reproducer of the exact cause of this issue. It seems as though despite the callback being run_valid, some aspects of the Callback that can affect input/output are still being called:

from fastai.vision.all import *
set_seed(99, True)
path = untar_data(URLs.PETS)/'images'
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))

learn = cnn_learner(dls, resnet18, cbs=MixUp())
b = dls[1].one_batch()
p = learn.get_preds(dl=[b], with_decoded=True)
x,y,its = dls.show_batch(b, 9, show=False)
b_out = type(b)(b[:dls.n_inp] + (tuple(p[2]) if is_listy(p[2]) else (p[2],)))

learn.cbs = learn.cbs[:-1]
p = learn.get_preds(dl=[b], with_decoded=True)
x,y,its = dls.show_batch(b, 9, show=False)
b_out_a = type(b)(b[:dls.n_inp] + (tuple(p[2]) if is_listy(p[2]) else (p[2],)))

test_eq(b_out, b_out_a)

(also this was present in the old MixUp implementation too, checked that)