Error when trying to resume training using Colab

dustyny commented 1 year ago

I get the following message when I try to resume training. The input frames directory is the has all the jpgs.. it works perfectly fine if don't try to resume.. including my notebook in case I've made a mistake and missed something. [Deforum_Stable_Diffusion.ipynb.zip](https://github.com/deforum/stable-diffusion/files/9854978/Deforum_Stable_Diffusion.ipynb.zip)

`Exporting Video Frames (1 every 1) frames to /content/drive/MyDrive/AI/StableDiffusion/2022-10/wtf_2nd_render/inputframes... Frames already unpacked Loading 10890 input frames from /content/drive/MyDrive/AI/StableDiffusion/2022-10/wtf_2nd_render/inputframes and saving video frames to /content/drive/MyDrive/AI/StableDiffusion/2022-10/wtf_2nd_render Saving animation frames to /content/drive/MyDrive/AI/StableDiffusion/2022-10/wtf_2nd_render Rendering animation frame 10161 of 10890 /usr/local/lib/python3.7/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

TypeError Traceback (most recent call last) in 550 render_animation(args, anim_args) 551 elif anim_args.animation_mode == 'Video Input': --> 552 render_input_video(args, anim_args) 553 elif anim_args.animation_mode == 'Interpolation': 554 render_interpolation(args, anim_args)

3 frames in render_input_video(args, anim_args) 394 args.overlay_mask = True 395 --> 396 render_animation(args, anim_args) 397 398 def render_interpolation(args, anim_args):

in render_animation(args, anim_args) 284 prev_img_cv2 = sample_to_cv2(prev_sample) 285 depth = depth_model.predict(prev_img_cv2, anim_args) if depth_model else None --> 286 prev_img = anim_frame_warp_3d(prev_img_cv2, depth, anim_args, keys, frame_idx) 287 288 # apply color matching

in anim_frame_warp_3d(prev_img_cv2, depth, anim_args, keys, frame_idx) 207 ] 208 rot_mat = p3d.euler_angles_to_matrix(torch.tensor(rotate_xyz, device=device), "XYZ").unsqueeze(0) --> 209 result = transform_image_3d(prev_img_cv2, depth, rot_mat, translate_xyz, anim_args) 210 torch.cuda.empty_cache() 211 return result

in transform_image_3d(prev_img_cv2, depth_tensor, rot_mat, translate, anim_args) 448 # range of [-1,1] is important to torch grid_sample's padding handling 449 y,x = torch.meshgrid(torch.linspace(-1.,1.,h,dtype=torch.float32,device=device),torch.linspace(-1.,1.,w,dtype=torch.float32,device=device)) --> 450 z = torch.as_tensor(depth_tensor, dtype=torch.float32, device=device) 451 xyz_old_world = torch.stack((x.flatten(), y.flatten(), z.flatten()), dim=1) 452

TypeError: must be real number, not NoneType`

johnnypeck commented 1 year ago

Hi @dustyny Did you get this resolved?

dustyny commented 1 year ago

Hi @johnnypeck I did not and unfortunately I don't have any other information on the issue,.. unless there are logs in colab I can check.

When I was trying to troubleshoot the issue I tried it without the resume and it worked perfectly fine.. I tried to figure out what broken but I didn't know what was supposed to be passed..

The way I had to work around it was to take the remaining images, turn them back into a video and then rerun the notebook with the last segment of the video..

johnnypeck commented 1 year ago

Did you try previous revisions of your colab? If you have autosave set it makes them for you. I don't recommend using autosave in colab. But,

You can run diffs on all your revisions to find out if you wonked something up. I'd do that first. It's awkwardly easy to introduce strange crap into colabs when your experimenting all over the place.

Did you start from a fresh, working deforum and repeat your steps? Are you using the resume functionality correctly? It's different than Disco Diffusion was. It actually bothered me quite a bit at first.

Given that, do you have a link to a copy of your colab you'd be willing to share and the steps to reproduce the error?

Logs all day in colab plus you have the cli access. The little icons on the bottom left.

Did your actual result match your desired result even with the workaround? That actually sounds less like a workaround and more toward my preferred style of doing things. Small bits. I digress.

dustyny commented 1 year ago

Let me clarify this was the unmodfied notebook, I made a copy but I didn't make any code changes until I got the error.. at that point all I did was add some print statements to try figure out what data was being held in the variables..

I've worked with Deforum for about a month now and had no problem using resume in the last version.. Given I didn't change anything and tried this multiple times, I suspect its a bug..

I attached a copy of the notebook in a zip file if you want to take a look..

I did get the results I was looking for, I just wasted a bunch of time getting it sorted out..

johnnypeck commented 1 year ago

Awesome. That definitely clarifies things and will make solving it significantly easier. I definitely didn't want to look at your customized colab. No offense. lol. My favorite thing to do is delete code. So we don't need the zip. It's the current main branch. Cool.

@dustyny - "I did get the results I was looking for, I just wasted a bunch of time getting it sorted out.." Isn't that how it's supposed go? I feel your pain my friend. Herding cats over here... lol

deforum / stable-diffusion

Error when trying to resume training using Colab #129