Warning with "result type Float can't be cast to the desired output type Long"

hibetterheyj commented 4 years ago

The operation virtual environment configuration is consistent with the given in the readme.md file

I am training on a Windows system and I get a RuntimeError as shown below. Could you give me some suggestions? Thanks in advance!

Epoch: 0 Loss: 1.1103
Unhandled exception in thread started by <function PytorchPlotFileLogger.save_image_grid_static at 0x0000024794C668C8>
Traceback (most recent call last):
  File "C:\Users\XXX\Anaconda3\envs\UNetEG\lib\site-packages\trixi\logger\file\pytorchplotfilelogger.py", line 183, in save_image_grid_static
    tv_save_image(tensor=tensor, filename=img_file, **image_args)
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\torchvision\utils.py", line 103, in save_image
    ndarr = grid.mul_(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()
RuntimeError: result type Float can't be cast to the desired output type Long

elpequeno commented 4 years ago

Hi hibetterheyj, I'm out of office until January 8th. After that I will check and let you know.

elpequeno commented 4 years ago

Hi hibetterheyj, sorry for the delay. A quick disclaimer: Our repo was developed and tested on Linux. We highly recommend using it within a Linux environment. I have hardly any experience using it within a windows environment. The error you are mentioning is new to me and I could not reproduce it so far. Can you please give me more hints on what you did? What dataset are you using? Did you install the requirements as explained in the readme, including the windows specific stuff? Are you trainin on a GPU? If so, which one?

Edit: What you should note is, that the error occurs during logging. You can comment out the plotting of the images for now (l.124-127 of UnetExperiment). Let me know if it trains without this.

hibetterheyj commented 4 years ago

Sorry for the late reply!

Setup

python 3.5.6 pytorch-cpu 1.3.1 (My cuda version is 8, so I comment the line in requirements.txt

Details

For initial attempt, I comment the code related to visdom as follows:

    exp = UNetExperiment(config=c, name=c.name, n_epochs=c.n_epochs,
                         seed=42, append_rnd_to_name=c.append_rnd_string, globs=globals(),
                         # visdomlogger_kwargs={"auto_start": c.start_visdom},
                         # connect to visdom
                         # loggers={
                         #     "visdom": ("visdom", {"auto_start": c.start_visdom})
                         # }
                         )

Then, I ran the python run_train_pipeline.py before the error happened:

Data already downloaded. Files are not extracted again.
The data has already been preprocessed. It will not be preprocessed again. Delete the folder to enforce it.
WARNING: Could not generate requirement for distribution -umpy 1.14.2 (c:\users\hyj\anaconda3\envs\uneteg\lib\site-packages): Parse error at "'-umpy==1'": Expected W:(abcd...)
Could not find git info for G:\##DL_Pro\beginner\basic_unet_example\run_train_pipeline.py
[WinError 2] 系统找不到指定的文件。 (The Chinese here means 'OS cannot find the specified file')
Experiment set up.
Experiment started.
=====TRAIN=====
Reshuffle...
Initializing... this might take a while...
Epoch: 0 Loss: 1.1095
Unhandled exception in thread started by <function PytorchPlotFileLogger.save_image_grid_static at 0x00000211CAA87620>
Traceback (most recent call last):
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\trixi\logger\file\pytorchplotfilelogger.py", line 183, in save_image_grid_static
    tv_save_image(tensor=tensor, filename=img_file, **image_args)
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\torchvision\utils.py", line 103, in save_image
    ndarr = grid.mul_(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()
RuntimeError: result type Float can't be cast to the desired output type Long
Epoch: 0 Loss: 0.1478
Unhandled exception in thread started by <function PytorchPlotFileLogger.save_image_grid_static at 0x00000211CAA87620>
Traceback (most recent call last):
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\trixi\logger\file\pytorchplotfilelogger.py", line 183, in save_image_grid_static
    tv_save_image(tensor=tensor, filename=img_file, **image_args)
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\torchvision\utils.py", line 103, in save_image
    ndarr = grid.mul_(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()
RuntimeError: result type Float can't be cast to the desired output type Long
Epoch: 0 Loss: 0.1187
Unhandled exception in thread started by <function PytorchPlotFileLogger.save_image_grid_static at 0x00000211CAA87620>
Traceback (most recent call last):
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\trixi\logger\file\pytorchplotfilelogger.py", line 183, in save_image_grid_static
    tv_save_image(tensor=tensor, filename=img_file, **image_args)
  File "C:\Users\HYJ\Anaconda3\envs\UNetEG\lib\site-packages\torchvision\utils.py", line 103, in save_image
    ndarr = grid.mul_(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()
RuntimeError: result type Float can't be cast to the desired output type Long
Epoch: 0 Loss: 0.1280

As you can see, the Loss kept going down with RuntimeError while the output results still can be saved correctly.

elpequeno commented 4 years ago

Did you run the code with the example hippocampus dataset or with your own one? Did the example dataset work? The problem is, that "save_imge_grid" apparently tries to cast your tensor to long. Please go ahead and comment the plotting part (all the "save_image_grid" lines, l.124-127 of UnetExperiment) and let me know if that changes anything. Please check the shape of your input, label and prediction.

hibetterheyj commented 4 years ago

I use the example dataset to run the code. After commenting the plotting part (all "show_image_grid" lines, UnetExperiment l.124-127), the code can run normally without error!

The specific results are as follows:

Experiment set up.
Experiment started.
=====TRAIN=====
Reshuffle...
Initializing... this might take a while...
Epoch: 0 Loss: 1.1095
Epoch: 0 Loss: 0.1009
...

Thanks a lot and feel sorry for the delay!

MIC-DKFZ / basic_unet_example

Warning with "result type Float can't be cast to the desired output type Long" #10

Setup

Details