Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
Apache License 2.0
Update for stable diffusion v2.0; Difference between encoders of stable diffusion & openclip #100

Open Junyi42 opened 1 year ago

Junyi42 commented 1 year ago

Hey, I was trying for the most recent stable diffusion v2, and find only below changes make it run well.

Describe alternatives you've considered In, from:

    # 1. Load the autoencoder model which will be used to decode the latents into image space. 
    self.vae = AutoencoderKL.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="vae", use_auth_token=self.token).to(self.device)

    # 2. Load the tokenizer and text encoder to tokenize and encode the text. 
    self.tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
    self.text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(self.device)

    # 3. The UNet model for generating the latents.
    self.unet = UNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet", use_auth_token=self.token).to(self.device)

change to:

      # 1. Load the autoencoder model which will be used to decode the latents into image space. 
      self.vae = AutoencoderKL.from_pretrained("stabilityai/stable-diffusion-2-base", subfolder="vae", use_auth_token=self.token).to(self.device)

      # 2. Load the tokenizer and text encoder to tokenize and encode the text. 
      self.tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2-base", subfolder="tokenizer", use_auth_token=self.token)
      self.text_encoder = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2-base", subfolder="text_encoder", use_auth_token=self.token).to(self.device)

      # 3. The UNet model for generating the latents.
      self.unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-base", subfolder="unet", use_auth_token=self.token).to(self.device)

Two points really confuse me are

Any help will be greatly appreciated!

ashawkey commented 1 year ago

@Junyi42 Hi, thanks for the effort!

flobotics commented 1 year ago

@ashawkey tried new version with stable-diffusion 2.0, but i get this error ? The previous version was running, i did only a "git pull" ? What do i wrong ?

 python --text "a hamburger" --workspace trial2 -O
Namespace(text='a hamburger', negative='', O=True, O2=False, test=False, save_mesh=False, eval_interval=10, workspace='trial2', guidance='stable-diffusion', seed=0, iters=10000, lr=0.001, ckpt='latest', cuda_ray=True, max_steps=512, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, albedo=False, albedo_iters=1000, uniform_sphere_rate=0.5, bg_radius=1.4, density_thresh=10, fp16=True, backbone='grid', sd_version='2.0', w=64, h=64, jitter_pose=False, bound=1, dt_gamma=0, min_near=0.1, radius_range=[1.0, 1.5], fovy_range=[40, 70], dir_text=True, suppress_face=False, angle_overhead=30, angle_front=60, lambda_entropy=0.0001, lambda_opacity=0, lambda_orient=0.01, lambda_smooth=0, gui=False, W=800, H=800, radius=3, fovy=60, light_theta=60, light_phi=0, max_spp=1)
  (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(903480, 2) gridtype=tiled align_corners=False interpolation=linear
  (sigma_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=32, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=64, bias=True)
      (2): Linear(in_features=64, out_features=4, bias=True)
  (encoder_bg): FreqEncoder: input_dim=3 degree=4 output_dim=27
  (bg_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=27, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=3, bias=True)
[INFO] try to load hugging face access token from the default place, make sure you have run `huggingface-cli login`.
[INFO] loading stable diffusion...
The config attributes {'dual_cross_attention': False, 'use_linear_projection': True} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
  File "C:\Users\SuperUserName\git\stable-dreamfusion\", line 141, in <module>
    guidance = StableDiffusion(device, opt.sd_version)
  File "C:\Users\SuperUserName\git\stable-dreamfusion\nerf\", line 47, in __init__
    self.unet = UNet2DConditionModel.from_pretrained(model_key, subfolder="unet", use_auth_token=self.token).to(self.device)
  File "C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\", line 412, in from_pretrained
    model, unused_kwargs = cls.from_config(
  File "C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\", line 169, in from_config
    model = cls(**init_dict)
  File "C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\", line 406, in inner_init
    init(self, *args, **init_kwargs)
  File "C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\models\", line 135, in __init__
    down_block = get_down_block(
  File "C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\models\", line 65, in get_down_block
    return CrossAttnDownBlock2D(
  File "C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\models\", line 508, in __init__
    out_channels // attn_num_head_channels,
TypeError: unsupported operand type(s) for //: 'int' and 'list'
flobotics commented 1 year ago

i did inside the anaconda prompt "pip install --upgrade diffusers[torch]" . Then it complained about missing tensorboard, which i installed with "pip install tensorboard" , now it returns :

 python --text "a hamburger" --workspace trial2 -O
Namespace(text='a hamburger', negative='', O=True, O2=False, test=False, save_mesh=False, eval_interval=10, workspace='trial2', guidance='stable-diffusion', seed=0, iters=10000, lr=0.001, ckpt='latest', cuda_ray=True, max_steps=512, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, albedo=False, albedo_iters=1000, uniform_sphere_rate=0.5, bg_radius=1.4, density_thresh=10, fp16=True, backbone='grid', sd_version='2.0', w=64, h=64, jitter_pose=False, bound=1, dt_gamma=0, min_near=0.1, radius_range=[1.0, 1.5], fovy_range=[40, 70], dir_text=True, suppress_face=False, angle_overhead=30, angle_front=60, lambda_entropy=0.0001, lambda_opacity=0, lambda_orient=0.01, lambda_smooth=0, gui=False, W=800, H=800, radius=3, fovy=60, light_theta=60, light_phi=0, max_spp=1)
  (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(903480, 2) gridtype=tiled align_corners=False interpolation=linear
  (sigma_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=32, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=64, bias=True)
      (2): Linear(in_features=64, out_features=4, bias=True)
  (encoder_bg): FreqEncoder: input_dim=3 degree=4 output_dim=27
  (bg_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=27, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=3, bias=True)
[INFO] try to load hugging face access token from the default place, make sure you have run `huggingface-cli login`.
[INFO] loading stable diffusion...
C:\Users\SuperUserName\anaconda3\lib\site-packages\diffusers\utils\ FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddim.DDIMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  warnings.warn(warning + message, FutureWarning)
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 308/308 [00:00<00:00, 309kB/s]
C:\Users\SuperUserName\anaconda3\lib\site-packages\huggingface_hub\ UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\SuperUserName\.cache\huggingface\diffusers. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article:
[INFO] loaded stable diffusion!
[INFO] Trainer: df | 2022-12-03_17-13-08 | cuda | fp16 | trial2
[INFO] #parameters: 1815479
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
==> Start Training trial2 Epoch 1, lr=0.010000 ...
  0% 0/100 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\SuperUserName\git\stable-dreamfusion\ in <module>                                    │
│                                                                                                  │
│   157 │   │   │   valid_loader = NeRFDataset(opt, device=device, type='val', H=opt.H, W=opt.W,   │
│   158 │   │   │                                                                                  │
│   159 │   │   │   max_epoch = np.ceil(opt.iters / len(train_loader)).astype(np.int32)            │
│ ❱ 160 │   │   │   trainer.train(train_loader, valid_loader, max_epoch)                           │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\nerf\ in train                                 │
│                                                                                                  │
│   483 │   │   for epoch in range(self.epoch + 1, max_epochs + 1):                                │
│   484 │   │   │   self.epoch = epoch                                                             │
│   485 │   │   │                                                                                  │
│ ❱ 486 │   │   │   self.train_one_epoch(train_loader)                                             │
│   487 │   │   │                                                                                  │
│   488 │   │   │   if self.workspace is not None and self.local_rank == 0:                        │
│   489 │   │   │   │   self.save_checkpoint(full=True, best=False)                                │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\nerf\ in train_one_epoch                       │
│                                                                                                  │
│   695 │   │   │   # update grid every 16 steps                                                   │
│   696 │   │   │   if self.model.cuda_ray and self.global_step % self.opt.update_extra_interval   │
│   697 │   │   │   │   with torch.cuda.amp.autocast(enabled=self.fp16):                           │
│ ❱ 698 │   │   │   │   │   self.model.update_extra_state()                                        │
│   699 │   │   │                                                                                  │
│   700 │   │   │   self.local_step += 1                                                           │
│   701 │   │   │   self.global_step += 1                                                          │
│                                                                                                  │
│ C:\Users\SuperUserName\anaconda3\lib\site-packages\torch\autograd\ in decorate_context    │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\nerf\ in update_extra_state                 │
│                                                                                                  │
│   622 │   │   │   │   │   │   # add noise in [-hgs, hgs]                                         │
│   623 │   │   │   │   │   │   cas_xyzs += (torch.rand_like(cas_xyzs) * 2 - 1) * half_grid_size   │
│   624 │   │   │   │   │   │   # query density                                                    │
│ ❱ 625 │   │   │   │   │   │   sigmas = self.density(cas_xyzs)['sigma'].reshape(-1).detach()      │
│   626 │   │   │   │   │   │   # assign                                                           │
│   627 │   │   │   │   │   │   tmp_grid[cas, indices] = sigmas                                    │
│   628                                                                                            │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\nerf\ in density                        │
│                                                                                                  │
│   147 │   def density(self, x):                                                                  │
│   148 │   │   # x: [N, 3], in [-bound, bound]                                                    │
│   149 │   │                                                                                      │
│ ❱ 150 │   │   sigma, albedo = self.common_forward(x)                                             │
│   151 │   │                                                                                      │
│   152 │   │   return {                                                                           │
│   153 │   │   │   'sigma': sigma,                                                                │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\nerf\ in common_forward                  │
│                                                                                                  │
│    77 │   │   # x: [N, 3], in [-bound, bound]                                                    │
│    78 │   │                                                                                      │
│    79 │   │   # sigma                                                                            │
│ ❱  80 │   │   h = self.encoder(x, bound=self.bound)                                              │
│    81 │   │                                                                                      │
│    82 │   │   h = self.sigma_net(h)                                                              │
│    83                                                                                            │
│                                                                                                  │
│ C:\Users\SuperUserName\anaconda3\lib\site-packages\torch\nn\modules\ in _call_impl         │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\gridencoder\ in forward                         │
│                                                                                                  │
│   153 │   │   prefix_shape = list(inputs.shape[:-1])                                             │
│   154 │   │   inputs = inputs.view(-1, self.input_dim)                                           │
│   155 │   │                                                                                      │
│ ❱ 156 │   │   outputs = grid_encode(inputs, self.embeddings, self.offsets, self.per_level_scal   │
│   157 │   │   outputs = outputs.view(prefix_shape + [self.output_dim])                           │
│   158 │   │                                                                                      │
│   159 │   │   #print('outputs', outputs.shape, outputs.dtype, outputs.min().item(),   │
│                                                                                                  │
│ C:\Users\SuperUserName\anaconda3\lib\site-packages\torch\cuda\amp\ in decorate_fwd   │
│                                                                                                  │
│   107 │   def decorate_fwd(*args, **kwargs):                                                     │
│   108 │   │   if cast_inputs is None:                                                            │
│   109 │   │   │   args[0]._fwd_used_autocast = torch.is_autocast_enabled()                       │
│ ❱ 110 │   │   │   return fwd(*args, **kwargs)                                                    │
│   111 │   │   else:                                                                              │
│   112 │   │   │   autocast_context = torch.is_autocast_enabled()                                 │
│   113 │   │   │   args[0]._fwd_used_autocast = False                                             │
│                                                                                                  │
│ C:\Users\SuperUserName\git\stable-dreamfusion\gridencoder\ in forward                          │
│                                                                                                  │
│    51 │   │   else:                                                                              │
│    52 │   │   │   dy_dx = None                                                                   │
│    53 │   │                                                                                      │
│ ❱  54 │   │   _backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S   │
│    55 │   │                                                                                      │
│    56 │   │   # permute back to [B, L * C]                                                       │
│    57 │   │   outputs = outputs.permute(1, 0, 2).reshape(B, L * C)                               │
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:
    1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: float, arg9: int, arg10: Optional[at::Tensor], arg11: int, arg12: bool) -> None

Invoked with: tensor([[0.0062, 0.0011, 0.0017],
        [0.0064, 0.0054, 0.0135],
        [0.0018, 0.0071, 0.0187],
        [0.9993, 0.9997, 0.9817],
        [0.9962, 0.9957, 0.9886],
        [0.9980, 0.9975, 0.9924]], device='cuda:0'), tensor([[-7.7486e-07,  5.3644e-05],
        [-8.2314e-05, -7.3612e-05],
        [-3.8505e-05,  2.6822e-05],
        [-6.2644e-05, -2.3842e-06],
        [-7.7724e-05, -8.1122e-05],
        [-1.8597e-05, -7.2241e-05]], device='cuda:0', dtype=torch.float16), tensor([     0,   4920,  18744,  51512, 117048, 182584, 248120, 313656, 379192,
        444728, 510264, 575800, 641336, 706872, 772408, 837944, 903480],
       device='cuda:0', dtype=torch.int32), tensor([[[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],


        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]]], device='cuda:0', dtype=torch.float16), 2097152, 3, 2, 16, 0.46666666666666684, 16, None, 1, False, 0
  0% 0/100 [00:00<?, ?it/s]
ashawkey commented 1 year ago

@flobotics Hi, you should rebuild gridencoder too: pip install ./gridencoder.

flobotics commented 1 year ago

@ashawkey thanks it works.

if the results are better/faster i dont know now :) (still interrested in cloud-gpu usage :))

good work

Junyi42 commented 1 year ago

@Junyi42 Hi, thanks for the effort!

  • I'm trying 2.0-base too, what prompts are you using that generates worse results compared to 1.5?
  • I think the submodule should work too, and for 2.0 this is the only choice.

Thanks for the reply!

  1. I tried "a doll", "a hotdog", and "a boy" for the stable diffusion 2.0, all of them yield a very simple scene while stable diffusion 1.5 provides plausible results. It's worth noting that all the above trials were using vanilla NeRF backbone, and --albedo, --lambda_entropy 1e-5 were set to avoid empty scenes. I think these settings may affect the results and I am trying the other backbone too (I'll update once I find something).
  2. Thanks, my confusion is resolved.

Thanks again for the wonderful work!