Hi, I tried the configuration in the environment.yaml.
As dgl-cuda11.0 is requiring cuda 11.0, I installed that.
However, pytorch only support cuda 11.0 till 1.7.1 version, and respective pytorch lightning support is till 0.8.5 version.
However, lightning 0.8.5 has an issue in missing "self.log", as following:
python classification.py train --dataset solidletters --dataset_path ../uv_net_dataset/SolidLetters --max_epochs 100 --batch_size 64 --experiment_name classification
Using backend: pytorch
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
-----------------------------------------------------------------------------------
UV-Net Classification
-----------------------------------------------------------------------------------
Logs written to results/classification/0724/230341
To monitor the logs, run:
tensorboard --logdir results/classification/0724/230341
The trained model with the best validation loss will be written to:
results/classification/0724/230341/best.ckpt
-----------------------------------------------------------------------------------
Loading train data...
0%| | 0/61964 [00:00<?, ?it/s]/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/dgl/base.py:45: DGLWarning: You are loading a graph file saved by old version of dgl. Please consider saving it again with the current format.
return warnings.warn(message, category=category, stacklevel=1)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 61964/61964 [00:26<00:00, 2367.91it/s]
Done loading 61964 files
Loading val data...
0%| | 0/15492 [00:00<?, ?it/s]/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/dgl/base.py:45: DGLWarning: You are loading a graph file saved by old version of dgl. Please consider saving it again with the current format.
return warnings.warn(message, category=category, stacklevel=1)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15492/15492 [00:10<00:00, 1521.90it/s]
Done loading 15492 files
/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:25: RuntimeWarning: You have defined a `val_dataloader()` and have defined a `validation_step()`, you may also want to define `validation_epoch_end()` for accumulating stats.
warnings.warn(*args, **kwargs)
| Name | Type | Params
----------------------------------------------
0 | model | UVNetClassifier | 1 M
1 | train_acc | Accuracy | 0
2 | val_acc | Accuracy | 0
3 | test_acc | Accuracy | 0
/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/home/zpc/projectbed/UV-Net-main/classification.py", line 97, in <module>
trainer.fit(model, train_loader, val_loader)
File "/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1044, in fit
results = self.run_pretrain_routine(model)
File "/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1193, in run_pretrain_routine
eval_results = self._evaluate(model,
File "/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
File "/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 470, in evaluation_forward
output = model.validation_step(*args)
File "/home/zpc/projectbed/UV-Net-main/uvnet/models.py", line 163, in validation_step
self.log("val_loss", loss, on_step=False, on_epoch=True)
File "/home/zpc/projectbed/UV-Net-main/venv2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 778, in __getattr__
raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'Classification' object has no attribute 'log'
Therefore, I tried to upgrade lightning to 0.9.0, 0.10.0.
Of course, the code is running ok, but the issue becomes that GPU is not used, as following:
Hi, I tried the configuration in the environment.yaml. As dgl-cuda11.0 is requiring cuda 11.0, I installed that. However, pytorch only support cuda 11.0 till 1.7.1 version, and respective pytorch lightning support is till 0.8.5 version. However, lightning 0.8.5 has an issue in missing "self.log", as following:
Therefore, I tried to upgrade lightning to 0.9.0, 0.10.0. Of course, the code is running ok, but the issue becomes that GPU is not used, as following:
I'm wondering whether I can upgrade cuda to 11.3 and use higher version of pytorch? If yes, should I also change to dgl-cuda11.3?