Open aki819 opened 11 months ago
Can this model achieve cross-attention similar to how transformer handles different modal embedding matrices?
Can this model achieve cross-attention similar to how transformer handles different modal embedding matrices?