Thank you for developing and maintaining this tool!
I have a question regarding the protein-level embeddings (mean-pooled final hidden layer). If two proteins are structurally similar, or orthologous, should we expect the protein embeddings to be close in L2/cosine distance? For instance, I am noticing that proteins that are orthologous (with >50% sequence identity, see attached example of conserved ribosomal proteins) and similar structure do not cluster when I conduct t-SNE or PCA in the embedding space. How can I explain this observation?
Thank you for developing and maintaining this tool! I have a question regarding the protein-level embeddings (mean-pooled final hidden layer). If two proteins are structurally similar, or orthologous, should we expect the protein embeddings to be close in L2/cosine distance? For instance, I am noticing that proteins that are orthologous (with >50% sequence identity, see attached example of conserved ribosomal proteins) and similar structure do not cluster when I conduct t-SNE or PCA in the embedding space. How can I explain this observation?