In the cross-attention mechanism, when the sequence length of x (denoted as seq_len) and the number of conditional embeddings cond (denoted as n_cond) are different, it is necessary to ensure that the dimensions of q, k, and v are compatible for attention computation
In the cross-attention mechanism, when the sequence length of x (denoted as seq_len) and the number of conditional embeddings cond (denoted as n_cond) are different, it is necessary to ensure that the dimensions of q, k, and v are compatible for attention computation