Open turian opened 2 years ago
I have a similar issue on a lambdalabs gpu.1x.a6000
Here's what I do in the shell:
sudo apt-get update
#sudo apt-get -y upgrade
sudo -H pip3 install --upgrade pip
pip3 install --upgrade setuptools pip
git clone https://github.com/autonomousvision/stylegan_xl.git
cd stylegan_xl
pip install gdown
/home/ubuntu/.local/bin/gdown 1aAJCZbXNHyraJ6Mi13dSbe7pTyfPXha0
unzip -o few-shot-image-datasets.zip
mkdir data
python dataset_tool.py --source=./few-shot-images/pokemon --dest=./data/pokemon16.zip \
--resolution=16x16 --transform=center-crop
python dataset_tool.py --source=./few-shot-images/pokemon --dest=./data/pokemon32.zip \
--resolution=32x32 --transform=center-crop
python dataset_tool.py --source=./few-shot-images/pokemon --dest=./data/pokemon64.zip \
--resolution=64x64 --transform=center-crop
python dataset_tool.py --source=./few-shot-images/pokemon --dest=./data/pokemon128.zip \
--resolution=128x128 --transform=center-crop
# https://github.com/ShinoharaHare/stylegan_xl/commit/8f1cc201ead4197be056f8eb5431fb0468070588
pip install --no-cache-dir --no-deps pillow==8.3.1 scipy==1.7.1 requests==2.26.0 tqdm==4.62.2 ninja==1.10.2 matplotlib==3.4.2 imageio==2.9.0 dill==0.3.4 psutil==5.8.0 regex==2022.3.15 imgui==1.3.0 glfw==2.2.0 pyopengl==3.1.5 imageio-ffmpeg==0.4.3 pyspng ftfy==6.1.1 timm==0.4.12 click
pip install --no-cache-dir tensorboard protobuf==3.20.*
pip install pybind11
sudo apt -y install python3-pybind11
python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon16.zip \
--gpus=1 --batch=64 --mirror=1 --snap 10 --batch-gpu 8 --kimg 10000 --syn_layers 10
And I get
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Traceback (most recent call last):
File "train.py", line 336, in <module>
main() # pylint: disable=no-value-for-parameter
File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "train.py", line 321, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 104, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "train.py", line 49, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/home/ubuntu/stylegan_xl/training/training_loop.py", line 339, in training_loop
loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
File "/home/ubuntu/stylegan_xl/training/loss.py", line 121, in accumulate_gradients
loss_Gmain.backward()
File "/usr/lib/python3/dist-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/lib/python3/dist-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/lib/python3/dist-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/home/ubuntu/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
grad_weight = Conv2dGradWeight.apply(grad_output, input)
File "/home/ubuntu/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
RuntimeError: No such operator aten::cudnn_convolution_backward_weight
Hi @turian ,
about your first problem you basically need: !pip install timm==0.5.4
about the second one try: pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f or if you use conda: conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 -c pytorch -c conda-forge
i have managed to run it in colab in the past so if you meet more errors will be glad to help
@Gad1001
$ pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f
Usage:
pip install [options] <requirement specifier> [package-index-options] ...
pip install [options] -r <requirements file> [package-index-options] ...
pip install [options] [-e] <vcs project url> ...
pip install [options] [-e] <local project path> ...
pip install [options] <archive url/path> ...
-f option requires 1 argument
and i also try:
$ pip install -f torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0
Defaulting to user installation because normal site-packages is not writeable
Looking in links: torch==1.9.0+cu111
WARNING: Location 'torch==1.9.0+cu111' is ignored: it is either a non-existing path or lacks a specific scheme.
ERROR: Could not find a version that satisfies the requirement torchvision==0.10.0+cu111 (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.8.0, 0.8.1, 0.8.2, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.11.3, 0.12.0, 0.13.0, 0.13.1)
ERROR: No matching distribution found for torchvision==0.10.0+cu111
@turian I am seeing the exact same problems right now.
I think the command is supposed to be
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
.
It is very slow so I have to wait.
@turian see below !pip install timm==0.5.4 !pip install ftfy !pip install Ninja !pip install setuptools==59.5.0 !pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
I'm trying to follow the README on colab.
I do:
But I get this error:
Check out this colab