PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.11k stars 5.55k forks source link

[maybe a bug] when breaking from an iteration of a dataloader, and the python process terminates, then it may report a strange error. #46774

Closed OccupyMars2025 closed 11 months ago

OccupyMars2025 commented 1 year ago

The paddle experts can just ignore this issure. I just record the error here. It is not urgent. I will check it when I have time.

bug描述 Describe the Bug

    for index, (images, labels) in enumerate(paddle_dataloader):
        if index > 5:
            break
        print(images.shape, labels.shape)

environment: Win11, pycharm, cpu, PaddlePaddle 2.3, python 3.7
I find the bug when I run the file C:\Users\Administrator\Desktop\contests\20220715_paddle_lwfx_7th\Going-deeper-with-Image-Transformers-using-PaddlePaddle\02_test_data.py

The error message is as follows:

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=8, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='C:\\Users\\Administrator\\Desktop\\contests\\20220715_paddle_lwfx_7th\\imagenet_dataset\\ILSVRC2012_img_val', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cpu', dist_eval=False, dist_url='env://', distillation_alpha=0.5, distillation_tau=1.0, distillation_type='none', distributed=False, drop=0.0, drop_path=0.1, epochs=300, eval=False, finetune='', inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='cait_XXS24', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=0, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', patience_epochs=10, pin_mem=True, pretrained=False, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, teacher_model='regnety_160', teacher_path='', train_info_txt='C:\\Users\\Administrator\\Desktop\\contests\\20220715_paddle_lwfx_7th\\imagenet_dataset\\train_list_empty.txt', train_interpolation='bicubic', val_info_txt='C:\\Users\\Administrator\\Desktop\\contests\\20220715_paddle_lwfx_7th\\imagenet_dataset\\val_list.txt', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
[12, 3, 224, 224] [12]
[12, 3, 224, 224] [12]
[12, 3, 224, 224] [12]
[12, 3, 224, 224] [12]
[12, 3, 224, 224] [12]
[12, 3, 224, 224] [12]
fail to perform transform [<paddle.vision.transforms.transforms.ToTensor object at 0x0000025CB918B9E8>] with error: We only support 'to_tensor()' in dynamic graph mode, please call 'paddle.disable_static()' to enter dynamic graph mode. and stack:
Traceback (most recent call last):
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 113, in __call__
    data = f(data)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 269, in __call__
    outputs.append(apply_func(inputs[i]))
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 355, in _apply_image
    return F.to_tensor(img, self.data_format)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\functional.py", line 82, in to_tensor
    return F_pil.to_tensor(pic, data_format)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\functional_pil.py", line 88, in to_tensor
    img = paddle.to_tensor(np.array(pic, copy=False))
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\fluid\framework.py", line 433, in __impl__
    ), "We only support '%s()' in dynamic graph mode, please call 'paddle.disable_static()' to enter dynamic graph mode." % func.__name__
AssertionError: We only support 'to_tensor()' in dynamic graph mode, please call 'paddle.disable_static()' to enter dynamic graph mode.

Exception in thread Thread-3:
Traceback (most recent call last):
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\fluid\dataloader\dataloader_iter.py", line 218, in _thread_loop
    self._thread_done_event)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\fluid\dataloader\fetcher.py", line 121, in fetch
    data.append(self.dataset[idx])
  File "C:\Users\Administrator\Desktop\contests\20220715_paddle_lwfx_7th\Going-deeper-with-Image-Transformers-using-PaddlePaddle\CaiT_paddle\datasets.py", line 126, in __getitem__
    image = self.transforms(image)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 118, in __call__
    raise e
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 113, in __call__
    data = f(data)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 269, in __call__
    outputs.append(apply_func(inputs[i]))
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\transforms.py", line 355, in _apply_image
    return F.to_tensor(img, self.data_format)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\functional.py", line 82, in to_tensor
    return F_pil.to_tensor(pic, data_format)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\vision\transforms\functional_pil.py", line 88, in to_tensor
    img = paddle.to_tensor(np.array(pic, copy=False))
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "C:\Users\Administrator\Anaconda3\envs\paddle-2.3-pytorch-1.8-python-3.7-env\lib\site-packages\paddle\fluid\framework.py", line 433, in __impl__
    ), "We only support '%s()' in dynamic graph mode, please call 'paddle.disable_static()' to enter dynamic graph mode." % func.__name__
AssertionError: We only support 'to_tensor()' in dynamic graph mode, please call 'paddle.disable_static()' to enter dynamic graph mode.

Process finished with exit code 0

Some situations that won't report the error:

1. using paddle dataloader, but the dataset is very small, so the dataloader is iterated without breaking from the "for" statement.


[12, 3, 224, 224] [12]
[12, 3, 224, 224] [12]
[6, 3, 224, 224] [6]

Process finished with exit code 0

2. use torch dataloader

3. when Process finished with exit code 0 doesn't occur, in my understanding, that is the python process doesn't terminate (I don't know how to express this situation)

In a word, I think when running a python file which includes the above code, if the program breaks from the iteration and then the python process terminates, then some strange error will occur. I'm not sure about it. I will check it in detail when I have time.

其他补充信息 Additional Supplementary Information

I'm not sure about the bug. I will check it in detail when I have time. You should pay attention to whether "Process finished with exit code 0" occurs. You need to understand what "Process finished with exit code 0" means.

paddle-bot[bot] commented 1 year ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

OccupyMars2025 commented 1 year ago

when running C:\Users\Administrator\Desktop\contests\contests_that_have_ended\20220509_paddle_lwfx\task3-Going-deeper-with-Image-Transformers\02_test_data.py , the result is as follows:

image

when running C:\Users\Administrator\Desktop\contests\20220715_paddle_lwfx_7th\Going-deeper-with-Image-Transformers-using-PaddlePaddle\02_test_data.py, the result is as follows:

image
OccupyMars2025 commented 1 year ago

Is the environment variable PYTHONUNBUFFERED the cause of the error ?? I'm not sure.

image
OccupyMars2025 commented 1 year ago

I found this error when I reproducing a paper:
https://github.com/OccupyMars2025/Going-deeper-with-Image-Transformers-using-PaddlePaddle

rainyfly commented 1 year ago

Try call paddle.disable_static() at the beginning of your code, it seems enter static graph mode in default.

paddle-bot[bot] commented 11 months ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。