When trying to run tp with the calico model, the embedding/head run into issues as it is using tied heads, but WordEmbedding may need changes to support tp with tied heads. Removing tp wrapping from embedding for Calico, and will address the tp wrapping in a later PR
When trying to run tp with the calico model, the embedding/head run into issues as it is using tied heads, but WordEmbedding may need changes to support tp with tied heads. Removing tp wrapping from embedding for Calico, and will address the tp wrapping in a later PR