Closed abelavit closed 4 months ago
Nope, the order of features does not matter. You should be able to extract embeddings for a dataset, shuffle the dimension of the embeddings (consistenly between protein, e.g. making F1 to F512 should be consistent between proteins), train a predictor and get identical performance (or near identical, depending on how you handle RNG on dataset sampling, weight init. etc.)
Thank you so much.
Hello,
I am curious about the order of features we get from the embeddings of pre-trained transformer model. If we get feature F1, F2, ... , F1024 (dimension 1x1024) from ProtT5 for each amino acid residue and we change the feature order, e.g. F24, F439, ... , F304 (dimension 1x1024), will it result in loss of information? If the order is important, would models like LSTM be more suitable for building a prediction model rather than algorithms like Random Forests which does not look at feature order?
Thank you.