deep-learning-with-pytorch / dlwpt-code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.
https://www.manning.com/books/deep-learning-with-pytorch
4.69k stars 1.98k forks source link

p2ch11/training.py causes an TypeError #107

Open Va6lue opened 1 year ago

Va6lue commented 1 year ago

Due to this error, my TensorBoard shows nothing.

My computer spec: CPU: AMD R7-7700 RAM: 16GB X 2 GPU: RTX 4090 24GB

My system spec: OS: Windows 11 IDE: VS Code Python: 3.9.13 PyTorch: 1.13.1+cu117

In:

#run('p2ch11.prepcache.LunaPrepCacheApp')  # I run this line successfully. Just to say that I have run this line.
run('p2ch11.training.LunaTrainingApp', '--epochs=1')  # I run this line in failure.

Out:

Details

2023-03-14 21:40:08,556 INFO pid:10608 nb:004:run Running: p2ch11.training.LunaTrainingApp(['--epochs=1', '--num-workers=8']).main() 2023-03-14 21:40:08,560 INFO pid:10608 p2ch11.training:079:initModel Using CUDA; 1 devices. 2023-03-14 21:40:08,563 INFO pid:10608 p2ch11.training:138:main Starting LunaTrainingApp, Namespace(num_workers=8, batch_size=1024, epochs=1, tb_prefix='p2ch11', comment='dwlpt') 2023-03-14 21:40:08,724 INFO pid:10608 p2ch11.dsets:182:__init__ : 495958 training samples 2023-03-14 21:40:08,744 INFO pid:10608 p2ch11.dsets:182:__init__ : 55107 validation samples 2023-03-14 21:40:08,745 INFO pid:10608 p2ch11.training:145:main Epoch 1 of 1, 485/54 batches of size 1024*1 2023-03-14 21:40:08,746 WARNING pid:10608 util.util:144:enumerateWithEstimate E1 Training ----/485, starting 2023-03-14 21:41:21,363 INFO pid:10608 util.util:161:enumerateWithEstimate E1 Training 64/485, done at 2023-03-14 21:47:11, 0:06:37 2023-03-14 21:44:03,276 INFO pid:10608 util.util:161:enumerateWithEstimate E1 Training 256/485, done at 2023-03-14 21:47:15, 0:06:41 2023-03-14 21:47:16,901 WARNING pid:10608 util.util:174:enumerateWithEstimate E1 Training ----/485, done at 2023-03-14 21:47:16 2023-03-14 21:47:18,692 INFO pid:10608 p2ch11.training:259:logMetrics E1 LunaTrainingApp 2023-03-14 21:47:18,698 INFO pid:10608 p2ch11.training:289:logMetrics E1 trn 0.0235 loss, 99.7% correct, 2023-03-14 21:47:18,698 INFO pid:10608 p2ch11.training:298:logMetrics E1 trn_neg 0.0041 loss, 100.0% correct (494577 of 494743) 2023-03-14 21:47:18,698 INFO pid:10608 p2ch11.training:309:logMetrics E1 trn_pos 7.9111 loss, 0.2% correct (2 of 1215)

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 run('p2ch11.training.LunaTrainingApp', '--epochs=1')

Cell In[2], line 7, in run(app, argv) 4 log.info("Running: {}({!r}).main()".format(app, argv)) 6 app_cls = importstr(app.rsplit('.', 1))
----> 7 app_cls(argv).main() 9 log.info("Finished: {}.{!r}).main()".format(app, argv))

File c:\DeepLearning_F1388\F1388_Code\p2ch11\training.py:155, in LunaTrainingApp.main(self) 145 log.info("Epoch {} of {}, {}/{} batches of size {}*{}".format( 146 epoch_ndx, 147 self.cli_args.epochs, (...) 151 (torch.cuda.device_count() if self.use_cuda else 1), 152 )) 154 trnMetrics_t = self.doTraining(epoch_ndx, train_dl) --> 155 self.logMetrics(epoch_ndx, 'trn', trnMetrics_t) 157 valMetrics_t = self.doValidation(epoch_ndx, val_dl) 158 self.logMetrics(epoch_ndx, 'val', valMetrics_t)

File c:\DeepLearning_F1388\F1388_Code\p2ch11\training.py:339, in LunaTrainingApp.logMetrics(self, epoch_ndx, mode_str, metrics_t, classificationThreshold) 336 posHist_mask = posLabel_mask & (metrics_t[METRICS_PRED_NDX] < 0.99) ... --> 386 cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32)) 387 start, end = np.searchsorted(cum_counts, [0, cum_counts[-1] - 1], side="right") 388 start = int(start)

TypeError: No loop matching the specified signature and casting was found for ufunc greater

Va6lue commented 1 year ago

I change the Python version to 3.7 and the followings are the packages I install.

Details

absl-py==1.4.0 astor==0.8.1 backcall==0.2.0 blosc==1.10.6 cassandra-driver==3.25.0 certifi==2022.12.7 charset-normalizer==2.1.1 click==8.1.3 colorama==0.4.6 cycler==0.11.0 debugpy==1.6.6 decorator==5.1.1 diskcache==4.1.0 entrypoints==0.4 fonttools==4.38.0 gast==0.2.2 geomet==0.2.1.post1 google-pasta==0.2.0 grpcio==1.51.3 h5py==3.8.0 idna==3.4 imageio==2.26.0 importlib-metadata==6.0.0 ipykernel==6.16.2 ipython==7.34.0 jedi==0.18.2 jupyter_client==7.4.9 jupyter_core==4.12.0 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 kiwisolver==1.4.4 Markdown==3.4.1 MarkupSafe==2.1.2 matplotlib==3.4.0 matplotlib-inline==0.1.6 nest-asyncio==1.5.6 networkx==2.6.3 numpy==1.21.6 opt-einsum==3.3.0 packaging==23.0 parso==0.8.3 pickleshare==0.7.5 Pillow==9.4.0 prompt-toolkit==3.0.38 protobuf==3.20.0 psutil==5.9.4 Pygments==2.14.0 pyparsing==3.0.9 python-dateutil==2.8.2 PyWavelets==1.3.0 pywin32==305 pyzmq==25.0.1 requests==2.28.1 scikit-image==0.15.0 scipy==1.5.0 SimpleITK==2.2.1 six==1.16.0 tensorboard==1.15.0 tensorflow-estimator==1.15.1 tensorflow-gpu==1.15.0 termcolor==2.2.0 torch==1.13.1+cu117 torchaudio==0.13.1+cu117 torchvision==0.14.1+cu117 tornado==6.2 traitlets==5.9.0 typing_extensions==4.5.0 urllib3==1.26.13 wcwidth==0.2.6 Werkzeug==2.2.3 wrapt==1.15.0 zipp==3.15.0

The memory explosion occurs.