Closed SalvatoreRa closed 1 year ago
Taking the mean is what we recommend indeed. The extract.py
script in this repo also provides that option.
@SalvatoreRa and @tomsercu - With x.mean(axis=1)
this will be a per-protein embedding with torch.Size([1, 1280])
?
Thank you for sharing this fantastic work, I have started to experiment with and it is great.
I have used the HuggingFace version to extract a representation for each protein, the idea is to use this representation in another application. Here, I am using two sequences and I have seen it provides a vector of 320-dimension for each aminoacid (I guess). Then to have a single vector for each sequence I used the mean. Do you advise me to do it differently? Should I use some different output? the embedding layer?
Here the example code:
Thank you very much