drawthingsai / draw-things-community

The community repository for the Draw Things app.
https://drawthings.ai
GNU General Public License v3.0
39 stars 5 forks source link

Trouble running on Macbook #2

Open davidw0311 opened 2 months ago

davidw0311 commented 2 months ago

Hi, I am wondering if a minimal woking example could be provided for running text to image? I have been trying to execute the code following a similar approach as in swift-diffusion on my mac mini M2, but get the error

Assertion failed: (backend != CCV_NNC_NO_BACKEND), function ccv_nnc_cmd_exec, file ccv_nnc_cmd.c, line 682.

after compiling the textModel and running:

let c: DynamicGraph.AnyTensor = textModel!(
                inputs: tokensTensorGPU, positionTensorGPU, casualAttentionMaskGPU)[0].as(
                    of: UseFloatingPoint.self
                ).reshaped(.CHW(2, 77, 768))

Thanks!

liuliu commented 2 months ago

This looks alright, probably change to NHWC. See the code: https://github.com/drawthingsai/draw-things-community/blob/main/Libraries/SwiftDiffusion/Sources/TextEncoder.swift#L831

Also, if there is NO_BACKEND, probably see if build --config=enable=mps is set in your .bazelrc (it should be set already in .bazelrc.darwin, just in case. As long as you run ./Scripts/install.sh, these are all set for you.

davidw0311 commented 2 months ago

Hello, here's my attempt to run a text-to-image generation example following the code from swift-diffusion

import Swift
import Foundation
import SwiftUI
import CoreGraphics
import PNG
import NNC
import Diffusion
import Accelerate

public typealias UseFloatingPoint = Float16

let textEncoderPath = "/Users/davidw/models/sd-v1.4.ckpt"
let vocabPath = "/Users/davidw/models/vocab.json"
let mergesPath = "/Users/davidw/models/merges.txt"

DynamicGraph.setSeed(123456)
DynamicGraph.logLevel = .verbose
DynamicGraph.memoryEfficient = true
let graph = DynamicGraph()
graph.workspaceSize = 1_024 * 1_024 * 1_024

let prompt = "a cute golden retreiver in the style of van gogh"
let maxLength = 77
let tokenizer = CLIPTokenizer(vocabulary: vocabPath, merges: mergesPath)

let tokens = tokenizer.tokenize(text: prompt, truncation: true, maxLength: maxLength).1
let uncondTokens = tokenizer.tokenize(text: "", truncation: true, maxLength: maxLength).1

let positionTensor = graph.variable(.CPU, .C(2 * maxLength), of: Int32.self)
let tokensTensor = graph.variable(.CPU, .C(2 * maxLength), of: Int32.self)
for i in 0..<maxLength {
    tokensTensor[i] = uncondTokens[i]
    tokensTensor[i + maxLength] = tokens[i]
    positionTensor[i] = Int32(i)
    positionTensor[i + maxLength] = Int32(i)
}

let textModel = TextEncoder<UseFloatingPoint>(
    filePaths: [textEncoderPath],
    version: .v1,
    usesFlashAttention: true,
    injectEmbeddings: false,
    externalOnDemand: false,
    maxLength: maxLength,
    clipSkip: 0,
    lora: []
)

let (encoding, _) = textModel.encode(
    tokens: [tokensTensor],
    positions: [positionTensor],
    mask: [],
    injectedEmbeddings: [],
    image: [],
    lengthsOfUncond: [maxLength],
    lengthsOfCond: [maxLength],
    textModels: []
)

print(encoding)

when I execute this I get error:

CCV_NNC_DATA_TRANSFER_FORWARD: [1] -> [1]
|-> 1. 0x151e059c0 (0x151e05a40:0) [154] 49406 49407 49407 ..
|<- 1. 0x6000024c93b0 (0x151e07ba0:0) [154] 49406 49407 49407 ..
CCV_NNC_DATA_TRANSFER_FORWARD: [1] -> [1]
|-> 1. 0x151e06300 (0x151e06380:0) [154] 0 1 2 ..
|<- 1. 0x6000024c8d20 (0x151e08230:0) [154] 0 1 2 ..
Swift/ContiguousArrayBuffer.swift:600: Fatal error: Index out of range

I am wondering where am I going wrong with my implementation? Thank you!

liuliu commented 2 months ago

I think it is this thing:

let (encoding, _) = textModel.encode(
    tokens: [tokensTensor],
    positions: [positionTensor],
    mask: [],
    injectedEmbeddings: [],
    image: [],
    lengthsOfUncond: [maxLength],
    lengthsOfCond: [maxLength],
    textModels: [nil]
)

The textModels expect to contain 1 value (SDXL: 2 values).

davidw0311 commented 2 months ago

Thank you! That does solve the error, but now I am getting

CCV_NNC_GEMM_FORWARD [8]: [3] -> [1] (2)
Wait: (2, 2)
|-> 1. 0x1301c80e0 (0x12ce45bc0:0) [154x768] 1.156250 -0.122498 0.021973 ..
|-> 2. 0x1301d4160 (0x12ce14c50:0) [768x768] -0.079773 -0.075256 -0.007603 ..
|-> 3. 0x1301d41d0 (0x12ce14dc0:0) [768] 0.000000 0.000000 0.000000 ..
|<- 1. 0x1301c8310 (0x12ce46650:0) [154x768] -0.433350 -0.454834 -0.808105 ..
Emit: (2, 4)
CCV_NNC_SCALED_DOT_PRODUCT_ATTENTION_FORWARD [9]: [6] -> [3] (0)
Wait: (0, 3), (0, 4)
|-> 1. 0x1301de960 (0x12ce46370:0) [2x77x12x64] 0.651855 0.182739 0.683105 ..
|-> 2. 0x1301de9d0 (0x12ce464e0:0) [2x77x12x64] -1.309570 -0.754883 -1.266602 ..
|-> 3. 0x1301dea40 (0x12ce46650:0) [2x77x12x64] -0.433350 -0.454834 -0.808105 ..
|-> 4. 0x1301d3d00 (0x11ce05f80:0) [2x1x77x77] 0.000000 -65504.000000 -65504.000000 ..
|-> 5. 0x1301d4240 (0x12ce14f30:0) [768x768] 0.061432 -0.039032 -0.016922 ..
|-> 6. 0x1301d42b0 (0x12ce150a0:0) [768] 0.000000 0.000000 0.000000 ..
Assertion failed: (backend != CCV_NNC_NO_BACKEND), function ccv_nnc_cmd_exec, file ccv_nnc_cmd.c, line 682.

this is my .bazelrc.darwin:

common:mps --define=enable_mps=true

common --disk_cache=.cache

build --cxxopt='-std=c++17'
build --config=mps

build --features=swift.use_global_module_cache
build --strategy=SwiftCompile=worker
build --features=swift.enable_batch_mode

common:release --define=enable_mps=true
common:release --swiftcopt=-whole-module-optimization
common:release --compilation_mode=opt
common:release --apple_generate_dsym

try-import %workspace%/.bazelrc.local

and my .bazelrc:

try-import %workspace%/.bazelrc.darwin
liuliu commented 2 months ago

Try this:

let positionTensor = graph.variable(.CPU, format: .NHWC, shape: [2 * maxLength], of: Int32.self)
let tokensTensor = graph.variable(.CPU, format: .NHWC, shape: [2 * maxLength], of: Int32.self)

SDAP only accepts NHWC shape of tensor, and these are inherited from the input tensor.

davidw0311 commented 2 months ago

Thanks! That solves the error :)

davidw0311 commented 2 months ago

Adapting code from swift-diffusion, I am trying to generate an image using:

let (c, _) = textModel.encode(
    tokens: [tokensTensor],
    positions: [positionTensor],
    mask: [],
    injectedEmbeddings: [],
    image: [],
    lengthsOfUncond: [maxLength],
    lengthsOfCond: [maxLength],
    textModels: [nil]
)

let startWidth = 64
let startHeight = 64

let generationSteps = 25

let unconditionalGuidanceScale: Float = 7.5
let scaleFactor: Float = 0.18215
let model = DiffusionModel(linearStart: 0.00085, linearEnd: 0.012, timesteps: 1_000, steps: generationSteps)
let alphasCumprod = model.alphasCumprod
let sigmasForTimesteps = DiffusionModel.sigmas(from: alphasCumprod)

let alphas = alphasCumprod.map { $0.squareRoot() }
let sigmas = alphasCumprod.map { (1 - $0).squareRoot() }
let lambdas = zip(alphas, sigmas).map { log($0) - log($1) }

let (unet, _) = UNet(
    batchSize: 2, embeddingLength: (77, 77), startWidth: startWidth, startHeight: startHeight,
                    usesFlashAttention: .scale1, injectControls: false, injectT2IAdapters: false,
                    injectIPAdapterLengths: [0]
        )

let (decoder, _, _) =  Decoder(
    channels: [128, 256, 512, 512], numRepeat: 2, batchSize: 1, startWidth: startWidth,
    startHeight: startHeight, usesFlashAttention: true, paddingFinalConvLayer: false)

var timestepList = [Int]()
var outputList = [DynamicGraph.Tensor<UseFloatingPoint>]()
let startTime = Date()
var lastSample: DynamicGraph.Tensor<UseFloatingPoint>? = nil

let x_T = graph.variable(.GPU(0), .NHWC(1, startHeight, startWidth, 4), of: UseFloatingPoint.self)
x_T.randn(std: 1, mean: 0)
var x = x_T
var xIn = graph.variable(.GPU(0), .NHWC(2, startHeight, startWidth, 4), of: UseFloatingPoint.self)

for i in 0..<model.steps {
    let timestep = model.timesteps - model.timesteps / model.steps * i - 1
    let ts = timeEmbedding(timestep: Float(timestep), batchSize: 2, embeddingSize: 320, maxPeriod: 10_000)
        .toGPU(0)
    let t = graph.variable(Tensor<UseFloatingPoint>(from: ts))

    xIn[0..<1, 0..<startHeight, 0..<startWidth, 0..<4] = x  
    xIn[1..<2, 0..<startHeight, 0..<startWidth, 0..<4] = x  

    var et = unet(inputs: xIn, t, c[0])[0].as(of: UseFloatingPoint.self)

    var etUncond = graph.variable(
        .GPU(0), .NHWC(1, startHeight, startWidth, 4), of: UseFloatingPoint.self)
    var etCond = graph.variable(
        .GPU(0), .NHWC(1, startHeight, startWidth, 4), of: UseFloatingPoint.self)

    etUncond[0..<1, 0..<startHeight, 0..<startWidth, 0..<4] =
    et[0..<1, 0..<startHeight, 0..<startWidth, 0..<4]
    etCond[0..<1, 0..<startHeight, 0..<startWidth, 0..<4] =
    et[1..<2, 0..<startHeight, 0..<startWidth, 0..<4]
    et = etUncond + unconditionalGuidanceScale * (etCond - etUncond)

    // UniPC sampler.
    let mt = Functional.add(
        left: x, right: et, leftScalar: 1.0 / alphas[timestep],
        rightScalar: -sigmas[timestep] / alphas[timestep])
    let useCorrector = lastSample != nil
    if useCorrector, let lastSample = lastSample {
        x = uniCBhUpdate(
            mt: mt, timestep: timestep, lastSample: lastSample, timestepList: timestepList,
            outputList: outputList, lambdas: lambdas, alphas: alphas, sigmas: sigmas)
    }
    if timestepList.count < 2 {
        timestepList.append(timestep)
    } else {
        timestepList[0] = timestepList[1]
        timestepList[1] = timestep
    }
    if outputList.count < 2 {
        outputList.append(mt)
    } else {
        outputList[0] = outputList[1]
        outputList[1] = mt
    }
    let prevTimestep = max(0, model.timesteps - model.timesteps / model.steps * (i + 1) - 1)
    lastSample = x
    x = uniPBhUpdate(
        mt: mt, prevTimestep: prevTimestep, sample: x, timestepList: timestepList,
        outputList: outputList, lambdas: lambdas, alphas: alphas, sigmas: sigmas)
}

let z = 1.0 / scaleFactor * x
let img = DynamicGraph.Tensor<Float>(from: decoder(inputs: z)[0].as(of: UseFloatingPoint.self))
    .toCPU()

not sure if this is the correct way, but I am getting

CCV_NNC_RANDOM_NORMAL_FORWARD: [0] -> [1]
|<- 1. 0x600000039490 (0x135e23a70:0) [1x64x64x4] -1.528320 -0.254395 -0.121826 ..
CCV_NNC_FORMAT_TRANSFORM_FORWARD: [1] -> [1]
|-> 1. 0x600000039490 (0x135e23a70:0) [1x64x64x4] -1.528320 -0.254395 -0.121826 ..
|<- 1. 0x60000002f480 (0x135e68770:0) [1x64x64x4] -1.528320 -0.254395 -0.121826 ..
CCV_NNC_FORMAT_TRANSFORM_FORWARD: [1] -> [1]
|-> 1. 0x600000039490 (0x135e23a70:0) [1x64x64x4] -1.528320 -0.254395 -0.121826 ..
|<- 1. 0x600001f8c000 (0x135e68770:0) [1x64x64x4] -1.528320 -0.254395 -0.121826 ..
Assertion failed: (input_size == model->input_size || model->input_size == 0), function ccv_cnnp_model_compile, file ccv_cnnp_model.c, line 573.

at the line

var et = unet(inputs: xIn, t, c[0])[0].as(of: UseFloatingPoint.self)

How can I make this work?

Thanks!

davidw0311 commented 2 months ago

I wonder if this is the more correct implementation, I am trying to initialize a UNetFromNNC object and compiling

var unet = UNetFromNNC<UseFloatingPoint>()

let x_T = graph.variable(.GPU(0), .NHWC(2, startHeight, startWidth, 4), of: UseFloatingPoint.self)
x_T.randn(std: 1, mean: 0)
let timestep = timeEmbedding(timestep: 0, batchSize: 2, embeddingSize: 320, maxPeriod: 10_000).toGPU(0)
let timestepTensor = graph.variable(Tensor<UseFloatingPoint>(from: timestep))

unet.compileModel(
    filePath: unetPath, externalOnDemand: true, version: .v1, upcastAttention: true,
    usesFlashAttention: true, injectControls: false, injectT2IAdapters: false,
    injectIPAdapterLengths: [0], lora: [],
    is8BitModel: false, canRunLoRASeparately: false, 
    inputs: x_T, timestepTensor, c, 
    tokenLengthUncond: 77, tokenLengthCond: 77,
    extraProjection: nil,
    injectedControls: [],
    injectedT2IAdapters: [],
    injectedIPAdapters: []
)

but run into the error:

CCV_NNC_LAYER_NORM_FORWARD [161]: [3] -> [3] (0)
|-> 1. 0x128025780 (0x141826580:0) [154x768] -5.328125 -1.828125 -4.781250 ..
|-> 2. 0x1280312c0 (0x140718c60:0) [1x768] 0.259766 0.989258 0.238281 ..
|-> 3. 0x128031330 (0x140718dd0:0) [1x768] 0.000000 0.000000 0.000000 ..
|<- 1. 0x12802bd70 (0x1407072f0:0) [154x768] -0.263672 -0.271240 -0.214355 ..
|<- 2. 0x1280257f0 (0x1418266f0:0) [154x1] -0.531738 ..
|<- 3. 0x128025860 (0x141826860:0) [154x1] 0.211670 ..
Graph Stream 0 End
|<- 1. 0x600000a64540 (0x1407072f0:0) [154x768] -0.263672 -0.271240 -0.214355 ..
CCV_NNC_RANDOM_NORMAL_FORWARD: [0] -> [1]
|<- 1. 0x6000009e6530 (0x140605f90:0) [2x64x64x4] -1.528320 -0.254395 -0.121826 ..
Assertion failed: (input_size == model->input_size || model->input_size == 0), function ccv_cnnp_model_compile, file ccv_cnnp_model.c, line 573.
liuliu commented 2 months ago
    injectIPAdapterLengths: [0], lora: [],

Should be

    injectIPAdapterLengths: [], lora: [],

Otherwise it will be treated as having IPAdapter tensor injected as input: https://github.com/drawthingsai/draw-things-community/blob/c0b21b67ffb16212bdf44e4159f26cc251cbdbd7/Libraries/SwiftDiffusion/Sources/Models/UNet.swift#L208

Also, you can see how we use it in this file: https://github.com/drawthingsai/draw-things-community/blob/c0b21b67ffb16212bdf44e4159f26cc251cbdbd7/Libraries/ImageGenerator/Sources/ImageGenerator.swift

davidw0311 commented 2 months ago

Thanks so much! Would you be able to provide a script for ie loading in the stable-diffusion 1.5 checkpoint and performing text to image generation using the provided script?

liuliu commented 2 months ago

Hi! It probably will be low on the list of things to do. If you can use ImageGenerator class, it is easier. For example, if you first setup the ModelZoo path correctly (i.e. the app required models, downloaded from https://static.libnnc.org/modelname, or in the app's Models container directory) following this line: https://github.com/drawthingsai/draw-things-community/blob/main/Apps/ModelConverter/Converter.swift#L34

Then you can simply call ImageGenerator to do text2img:

let imageGenerator = ImageGenerator(
      queue: queue, configurations: configurations, workspace: workspace, tokenizerV1: tokenizerV1,
      tokenizerV2: tokenizerV2, tokenizerXL: tokenizerXL, tokenizerKandinsky: tokenizerKandinsky,
      poseDrawer: DefaultPoseDrawer())
let (tensors, scale) = self.imageGenerator.generate(
    nil, scaleFactor: 1, mask: nil,
    depth: nil,
    hints: [:], custom: nil, shuffles: [], text: prompt,
    negativeText: negativePrompt,
    configuration: configuration
  ) { signpost, signposts, tensor in
    return true
  }