In the variant prediction example notebook esm/examples/sup_variant_prediction.ipynb the top 60 principal components are selected and used to reduce the dimensionality of the training set. By doing this before running CV there is data leakage into the cross-validation sets. [(https://github.com/facebookresearch/esm/discussions/140)]
This PR pushes the selection of the principal components to inside the CV step.
In the variant prediction example notebook esm/examples/sup_variant_prediction.ipynb the top 60 principal components are selected and used to reduce the dimensionality of the training set. By doing this before running CV there is data leakage into the cross-validation sets. [(https://github.com/facebookresearch/esm/discussions/140)]
This PR pushes the selection of the principal components to inside the CV step.