divamgupta / diffusionbee-stable-diffusion-ui

Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
https://diffusionbee.com
GNU Affero General Public License v3.0
12.59k stars 626 forks source link

Feature Request: Animation #211

Open xirtus opened 2 years ago

xirtus commented 2 years ago

The limit of 14 creations at a time is too few, the limit should be ±10,000 images. Animation is well documented in discodiffusion.com - below is a basic concept of 3d and 2d video warping with RAFT - with below code one can create a sequence of animatons from the following sequence prompt (from frame 0 to 48)

text_prompts = { 0: [ "a painting by of the exorcist girl on a bed by a priest in 1950s interior decor,", "A Kelley Jones art, Gil Kane art, Barry Windsor-Smith art, Steve Ditko, Joe Kubert art, Neal Adams, jack kirby art, Unreal Engine 3d", ], 48: [ "a painting of a teenage witch in a hood on a spiral staircase,", "Sam Keith art, Kelley Jones art, Gil Kane art, Barry Windsor-Smith art, Steve Ditko, Joe Kubert art, Neal Adams, jack kirby art,", ],

}

etc.

@markdown ####Animation Mode:

animation_mode = '3D' #@param ['None', '2D', '3D', 'Video Input'] {type:'string'}

@markdown For animation, you probably want to turn cutn_batches to 1 to make it quicker.

@markdown ---

@markdown ####Video Input Settings:

if is_colab: video_init_path = "/content/drive/MyDrive/init.mp4" #@param {type: 'string'} else: video_init_path = "init.mp4" #@param {type: 'string'} extract_nth_frame = 2 #@param {type: 'number'} persistent_frame_output_in_batch_folder = True #@param {type: 'boolean'} video_init_seed_continuity = False #@param {type: 'boolean'}

@markdown #####Video Optical Flow Settings:

video_init_flow_warp = True #@param {type: 'boolean'}

Call optical flow from video frames and warp prev frame with flow

video_init_flow_blend = 0.999#@param {type: 'number'} #0 - take next frame, 1 - take prev warped frame video_init_check_consistency = False #Insert param here when ready video_init_blend_mode = "optical flow" #@param ['None', 'linear', 'optical flow']

Call optical flow from video frames and warp prev frame with flow

if animation_mode == "Video Input": if persistent_frame_output_in_batch_folder or (not is_colab): #suggested by Chris the Wizard#8082 at discord videoFramesFolder = f'{batchFolder}/videoFrames' else: videoFramesFolder = f'/content/videoFrames' createPath(videoFramesFolder) print(f"Exporting Video Frames (1 every {extract_nth_frame})...") try: for f in pathlib.Path(f'{videoFramesFolder}').glob('*.jpg'): f.unlink() except: print('') vf = f'select=not(mod(n\,{extract_nth_frame}))' if os.path.exists(video_init_path): subprocess.run(['ffmpeg', '-i', f'{video_init_path}', '-vf', f'{vf}', '-vsync', 'vfr', '-q:v', '2', '-loglevel', 'error', '-stats', f'{videoFramesFolder}/%04d.jpg'], stdout=subprocess.PIPE).stdout.decode('utf-8') else: print(f'\nWARNING!\n\nVideo not found: {video_init_path}.\nPlease check your video path.\n')

!ffmpeg -i {video_init_path} -vf {vf} -vsync vfr -q:v 2 -loglevel error -stats {videoFramesFolder}/%04d.jpg

@markdown ---

@markdown ####2D Animation Settings:

@markdown zoom is a multiplier of dimensions, 1 is no zoom.

@markdown All rotations are provided in degrees.

key_frames = True #@param {type:"boolean"} max_frames = 10000#@param {type:"number"}

if animation_mode == "Video Input": max_frames = len(glob(f'{videoFramesFolder}/*.jpg'))

interp_spline = 'Linear' #Do not change, currently will not look good. param ['Linear','Quadratic','Cubic']{type:"string"} angle = "0:(0)"#@param {type:"string"} zoom = "0: (1), 10: (1.05)"#@param {type:"string"} translation_x = "0:(0),70:(0),71:(3.5),106:(0),141:(0),142:(-3.5),179:(0),213:(0),214:(3.5),250:(0),285:(0),286:(-3.5),322:(0),1666:(0),1667:(6),1675:(3),1676:(0),1809:(0),1810:(-6),1818:(-3),1819:(0),1953:(0),1954:(6),1962:(3),1963:(0),3949:(0),3950:(-6),3956:(-3),3957:(0),4091:(0),4092:(6),4099:(3),4100:(0),4234:(0),4235:(-6),4242:(-3),4243:(0)"#@param {type:"string"} translation_y = "0:(0),36:(0),37:(3.5),71:(0),105:(0),106:(-3.5),142:(0),178:(0),179:(3.5),214:(0),249:(0),250:(-3.5),286:(0),1523:(0),1524:(4),1533:(8),1534:(0),1882:(0),1882:(-4),1888:(-8),1889:(0),3806:(0),3807:(8),3814:(4),3815:(0),4161:(0),4162:(-8),4170:(-4),4171:(0)"#@param {type:"string"} translation_z = "0:(0),18:(0),37:(2.5),257:(1),322:(3.5),1425:(4),1444:(5),1453:(-6.5),1462:(5),2033:(5),2130:(4),2569:(3.5),2602:(0),2674:(-2),2889:(-2.2),3137:(-2.2),3173:(-2),3602:(-2.5),3709:(-3.5),3741:(-5),3885:(-5),3886:(8),3895:(0),3904:(-5),4313:(-4),4351:(0)"#@param {type:"string"} rotation_3d_x = "0:(0),321:(0),322:(0.007),892:(0.007),926:(-0.01),962:(0.01),998:(-0.01),1034:(0.01),1069:(-0.01),1105:(0.01),1140:(-0.01)),1176:(0.01),1212:(-0.01)),1247:(0.01),1282:(-0.01)),1319:(0.01),1354:(-0.01)),1389:(0.01),1425:(-0.01)),1461:(0)"#@param {type:"string"} rotation_3d_y = "0:(0),1461:(0),1462:(0.01),1532:(0.01),1533:(-0.01),1604:(-0.01),1605:(0.01),1675:(0.01),1676:(-0.01),1747:(-0.01),1748:(0.01),1818:(0.01),1819:(-0.01),1889:(-0.01),1889:(0.01),1960:(0.01),1961:(-0.01),2032:(-0.01),2033:(0)"#@param {type:"string"} rotation_3d_z = "0:(0),1603:(0),1604:(0.03),1622:(0.03),1623:(0),3743:(0),3744:(-0.01),3814:(-0.01),3815:(0.01),3885:(0.01),3886:(0),3903:(0),3904:(-0.01),3957:(-0.01),3958:(0.01),4028:(0.01),4029:(-0.01),4098:(-0.01),4099:(0.01),4170:(0.01),4171:(-0.01),4241:(-0.01),4242:(0.01),4312:(0.01),4313:(0)"#@param {type:"string"} midas_depth_model = "dpt_large"#@param {type:"string"} midas_weight = 0.3#@param {type:"number"} near_plane = 200#@param {type:"number"} far_plane = 10000#@param {type:"number"} fov = 40#@param {type:"number"} padding_mode = 'border'#@param {type:"string"} sampling_mode = 'bicubic'#@param {type:"string"}

======= TURBO MODE

@markdown ---

@markdown ####Turbo Mode (3D anim only):

@markdown (Starts after frame 10,) skips diffusion steps and just uses depth map to warp images for skipped frames.

@markdown Speeds up rendering by 2x-4x, and may improve image coherence between frames.

@markdown For different settings tuned for Turbo Mode, refer to the original Disco-Turbo Github: https://github.com/zippy731/disco-diffusion-turbo

turbo_mode = True #@param {type:"boolean"} turbo_steps = "3" #@param ["2","3","4","5","6"] {type:"string"} turbo_preroll = 10 # frames

insist turbo be used only w 3d anim.

if turbo_mode and animation_mode != '3D': print('=====') print('Turbo mode only available with 3D animations. Disabling Turbo.') print('=====') turbo_mode = False

@markdown ---

@markdown ####Coherency Settings:

@markdown frame_scale tries to guide the new frame to looking like the old one. A good default is 1500.

frames_scale = 1500 #@param{type: 'integer'}

@markdown frame_skip_steps will blur the previous frame - higher values will flicker less but struggle to add enough new detail to zoom into.

frames_skip_steps = '60%' #@param ['40%', '50%', '60%', '70%', '80%'] {type: 'string'}

@markdown ####Video Init Coherency Settings:

@markdown frame_scale tries to guide the new frame to looking like the old one. A good default is 1500.

video_init_frames_scale = 15000 #@param{type: 'integer'}

@markdown frame_skip_steps will blur the previous frame - higher values will flicker less but struggle to add enough new detail to zoom into.

video_init_frames_skip_steps = '70%' #@param ['40%', '50%', '60%', '70%', '80%'] {type: 'string'}

======= VR MODE

@markdown ---

@markdown ####VR Mode (3D anim only):

@markdown Enables stereo rendering of left/right eye views (supporting Turbo) which use a different (fish-eye) camera projection matrix.

@markdown Note the images you're prompting will work better if they have some inherent wide-angle aspect

@markdown The generated images will need to be combined into left/right videos. These can then be stitched into the VR180 format.

@markdown Google made the VR180 Creator tool but subsequently stopped supporting it. It's available for download in a few places including https://www.patrickgrunwald.de/vr180-creator-download

@markdown The tool is not only good for stitching (videos and photos) but also for adding the correct metadata into existing videos, which is needed for services like YouTube to identify the format correctly.

@markdown Watching YouTube VR videos isn't necessarily the easiest depending on your headset. For instance Oculus have a dedicated media studio and store which makes the files easier to access on a Quest https://creator.oculus.com/manage/mediastudio/

@markdown

@markdown The command to get ffmpeg to concat your frames for each eye is in the form: ffmpeg -framerate 15 -i frame_%4d_l.png l.mp4 (repeat for r)

vr_mode = False #@param {type:"boolean"}

@markdown vr_eye_angle is the y-axis rotation of the eyes towards the center

vr_eye_angle = 0.5 #@param{type:"number"}

@markdown interpupillary distance (between the eyes)

vr_ipd = 5.0 #@param{type:"number"}

insist VR be used only w 3d anim.

if vr_mode and animation_mode != '3D': print('=====') print('VR mode only available with 3D animations. Disabling VR.') print('=====') vr_mode = False

def parse_key_frames(string, prompt_parser=None): """Given a string representing frame numbers paired with parameter values at that frame, return a dictionary with the frame numbers as keys and the parameter values as the values.

Parameters
----------
string: string
    Frame numbers paired with parameter values at that frame number, in the format
    'framenumber1: (parametervalues1), framenumber2: (parametervalues2), ...'
prompt_parser: function or None, optional
    If provided, prompt_parser will be applied to each string of parameter values.

Returns
-------
dict
    Frame numbers as keys, parameter values at that frame number as values

Raises
------
RuntimeError
    If the input string does not match the expected format.

Examples
--------
>>> parse_key_frames("10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)")
{10: 'Apple: 1| Orange: 0', 20: 'Apple: 0| Orange: 1| Peach: 1'}

>>> parse_key_frames("10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)", prompt_parser=lambda x: x.lower()))
{10: 'apple: 1| orange: 0', 20: 'apple: 0| orange: 1| peach: 1'}
"""
import re
pattern = r'((?P<frame>[0-9]+):[\s]*[\(](?P<param>[\S\s]*?)[\)])'
frames = dict()
for match_object in re.finditer(pattern, string):
    frame = int(match_object.groupdict()['frame'])
    param = match_object.groupdict()['param']
    if prompt_parser:
        frames[frame] = prompt_parser(param)
    else:
        frames[frame] = param

if frames == {} and len(string) != 0:
    raise RuntimeError('Key Frame string not correctly formatted')
return frames

def get_inbetweens(key_frames, integer=False): """Given a dict with frame numbers as keys and a parameter value as values, return a pandas Series containing the value of the parameter at every frame from 0 to max_frames. Any values not provided in the input dict are calculated by linear interpolation between the values of the previous and next provided frames. If there is no previous provided frame, then the value is equal to the value of the next provided frame, or if there is no next provided frame, then the value is equal to the value of the previous provided frame. If no frames are provided, all frame values are NaN.

Parameters
----------
key_frames: dict
    A dict with integer frame numbers as keys and numerical values of a particular parameter as values.
integer: Bool, optional
    If True, the values of the output series are converted to integers.
    Otherwise, the values are floats.

Returns
-------
pd.Series
    A Series with length max_frames representing the parameter values for each frame.

Examples
--------
>>> max_frames = 5
>>> get_inbetweens({1: 5, 3: 6})
0    5.0
1    5.0
2    5.5
3    6.0
4    6.0
dtype: float64

>>> get_inbetweens({1: 5, 3: 6}, integer=True)
0    5
1    5
2    5
3    6
4    6
dtype: int64
"""
key_frame_series = pd.Series([np.nan for a in range(max_frames)])

for i, value in key_frames.items():
    key_frame_series[i] = value
key_frame_series = key_frame_series.astype(float)

interp_method = interp_spline

if interp_method == 'Cubic' and len(key_frames.items()) <=3:
  interp_method = 'Quadratic'

if interp_method == 'Quadratic' and len(key_frames.items()) <= 2:
  interp_method = 'Linear'

key_frame_series[0] = key_frame_series[key_frame_series.first_valid_index()]
key_frame_series[max_frames-1] = key_frame_series[key_frame_series.last_valid_index()]
# key_frame_series = key_frame_series.interpolate(method=intrp_method,order=1, limit_direction='both')
key_frame_series = key_frame_series.interpolate(method=interp_method.lower(),limit_direction='both')
if integer:
    return key_frame_series.astype(int)
return key_frame_series

def split_prompts(prompts): prompt_series = pd.Series([np.nan for a in range(max_frames)]) for i, prompt in prompts.items(): prompt_series[i] = prompt

prompt_series = prompt_series.astype(str)

prompt_series = prompt_series.ffill().bfill()
return prompt_series

if key_frames: try: angle_series = get_inbetweens(parse_key_frames(angle)) except RuntimeError as e: print( "WARNING: You have selected to use key frames, but you have not " "formatted angle correctly for key frames.\n" "Attempting to interpret angle as " f'"0: ({angle})"\n' "Please read the instructions to find out how to use key frames " "correctly.\n" ) angle = f"0: ({angle})" angle_series = get_inbetweens(parse_key_frames(angle))

try:
    zoom_series = get_inbetweens(parse_key_frames(zoom))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `zoom` correctly for key frames.\n"
        "Attempting to interpret `zoom` as "
        f'"0: ({zoom})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    zoom = f"0: ({zoom})"
    zoom_series = get_inbetweens(parse_key_frames(zoom))

try:
    translation_x_series = get_inbetweens(parse_key_frames(translation_x))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `translation_x` correctly for key frames.\n"
        "Attempting to interpret `translation_x` as "
        f'"0: ({translation_x})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    translation_x = f"0: ({translation_x})"
    translation_x_series = get_inbetweens(parse_key_frames(translation_x))

try:
    translation_y_series = get_inbetweens(parse_key_frames(translation_y))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `translation_y` correctly for key frames.\n"
        "Attempting to interpret `translation_y` as "
        f'"0: ({translation_y})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    translation_y = f"0: ({translation_y})"
    translation_y_series = get_inbetweens(parse_key_frames(translation_y))

try:
    translation_z_series = get_inbetweens(parse_key_frames(translation_z))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `translation_z` correctly for key frames.\n"
        "Attempting to interpret `translation_z` as "
        f'"0: ({translation_z})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    translation_z = f"0: ({translation_z})"
    translation_z_series = get_inbetweens(parse_key_frames(translation_z))

try:
    rotation_3d_x_series = get_inbetweens(parse_key_frames(rotation_3d_x))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `rotation_3d_x` correctly for key frames.\n"
        "Attempting to interpret `rotation_3d_x` as "
        f'"0: ({rotation_3d_x})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    rotation_3d_x = f"0: ({rotation_3d_x})"
    rotation_3d_x_series = get_inbetweens(parse_key_frames(rotation_3d_x))

try:
    rotation_3d_y_series = get_inbetweens(parse_key_frames(rotation_3d_y))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `rotation_3d_y` correctly for key frames.\n"
        "Attempting to interpret `rotation_3d_y` as "
        f'"0: ({rotation_3d_y})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    rotation_3d_y = f"0: ({rotation_3d_y})"
    rotation_3d_y_series = get_inbetweens(parse_key_frames(rotation_3d_y))

try:
    rotation_3d_z_series = get_inbetweens(parse_key_frames(rotation_3d_z))
except RuntimeError as e:
    print(
        "WARNING: You have selected to use key frames, but you have not "
        "formatted `rotation_3d_z` correctly for key frames.\n"
        "Attempting to interpret `rotation_3d_z` as "
        f'"0: ({rotation_3d_z})"\n'
        "Please read the instructions to find out how to use key frames "
        "correctly.\n"
    )
    rotation_3d_z = f"0: ({rotation_3d_z})"
    rotation_3d_z_series = get_inbetweens(parse_key_frames(rotation_3d_z))

else: angle = float(angle) zoom = float(zoom) translation_x = float(translation_x) translation_y = float(translation_y) translation_z = float(translation_z) rotation_3d_x = float(rotation_3d_x) rotation_3d_y = float(rotation_3d_y) rotation_3d_z = float(rotation_3d_z)

then flow consistency

@title Generate optical flow and consistency maps

@markdown Run once per init video

if animation_mode == "Video Input": import gc

force_flow_generation = False #@param {type:'boolean'}
in_path = videoFramesFolder
flo_folder = f'{in_path}/out_flo_fwd'

if not video_init_flow_warp:
    print('video_init_flow_warp not set, skipping')

if (animation_mode == 'Video Input') and (video_init_flow_warp):
    flows = glob(flo_folder+'/*.*')
    if (len(flows)>0) and not force_flow_generation:
        print(f'Skipping flow generation:\nFound {len(flows)} existing flow files in current working folder: {flo_folder}.\nIf you wish to generate new flow files, check force_flow_generation and run this cell again.')

    if (len(flows)==0) or force_flow_generation:
        frames = sorted(glob(in_path+'/*.*'));
        if len(frames)<2: 
            print(f'WARNING!\nCannot create flow maps: Found {len(frames)} frames extracted from your video input.\nPlease check your video path.')
        if len(frames)>=2:

            raft_model = torch.nn.DataParallel(RAFT(args2))
            raft_model.load_state_dict(torch.load(f'{root_path}/RAFT/models/raft-things.pth'))
            raft_model = raft_model.module.cuda().eval()

            for f in pathlib.Path(f'{flo_fwd_folder}').glob('*.*'):
                f.unlink()

            temp_flo = in_path+'/temp_flo'
            flo_fwd_folder = in_path+'/out_flo_fwd'

            createPath(flo_fwd_folder)
            createPath(temp_flo)

            # TBD Call out to a consistency checker?

            framecount = 0
            for frame1, frame2 in tqdm(zip(frames[:-1], frames[1:]), total=len(frames)-1):

                out_flow21_fn = f"{flo_fwd_folder}/{frame1.split('/')[-1]}"

                frame1 = load_img(frame1, width_height)
                frame2 = load_img(frame2, width_height)

                flow21 = get_flow(frame2, frame1, raft_model)
                np.save(out_flow21_fn, flow21)

                if video_init_check_consistency:
                    # TBD
                    pass

            del raft_model 
            gc.collect()
Leland commented 2 years ago

Think your pasted code got mangled by the Markdown parser, maybe just link to where the code may be found?

Or if it is your own, use e.g. https://gist.github.com/

xirtus commented 2 years ago

Think your pasted code got mangled by the Markdown parser, maybe just link to where the code may be found?

Or if it is your own, use e.g. https://gist.github.com/

Thank you, posted here, https://gist.github.com/xirtus/de9b76a79ecdab9ad2328cc865277cfa

xirtus commented 2 years ago

also a few interesting projects for stable diffusion animation workflows working right now, most interestingly the top one Deforum which is well curated and appropriate: https://github.com/HelixNGC7293/DeforumStableDiffusionLocal https://github.com/amotile/stable-diffusion-studio https://github.com/thomsan/Deforum_Stable_Diffusion/blob/main/Deforum_Stable_Diffusion.ipynb

xirtus commented 2 years ago

just realized my gist wasn't public so I made it public