Feature order - Githubissues

abelavit commented 7 months ago

Hello,

I am curious about the order of features we get from the embeddings of pre-trained transformer model. If we get feature F1, F2, ... , F1024 (dimension 1x1024) from ProtT5 for each amino acid residue and we change the feature order, e.g. F24, F439, ... , F304 (dimension 1x1024), will it result in loss of information? If the order is important, would models like LSTM be more suitable for building a prediction model rather than algorithms like Random Forests which does not look at feature order?

Thank you.

mheinzinger commented 6 months ago

Nope, the order of features does not matter. You should be able to extract embeddings for a dataset, shuffle the dimension of the embeddings (consistenly between protein, e.g. making F1 to F512 should be consistent between proteins), train a predictor and get identical performance (or near identical, depending on how you handle RNG on dataset sampling, weight init. etc.)

abelavit commented 6 months ago

Thank you so much.

agemagician / ProtTrans

Feature order #149