Open Hjorvik opened 4 years ago
Hi @Hjorvik, apologies for slow reply. IIRC this is already supported to some extent, e.g., if you create a PCA on one array gn1
:
coords1, model = allel.pca(gn1)
...then you can use the model to transform a different array gn2
, e.g.:
coords2 = model.transform(gn2)
Would this suffice, or do you also need to be able to persist the model somehow so you can run the initial PCA and then do the projection in different sessions?
(Adding link to the allel.stats.decomposition source code if anyone is wondering how this works.)
Hi! I'm starting to use scikit allel, and I'll really enjoying it. However, there is a feature that I believe it's missing, and it is the possibility to extract the PCs from some of the samples and then project the rest of the samples to that space. This is specially usefull when you are working with data that have a lot of missing variants (for example, aDNA data), and would be a nice addition to the toolkit. As a popular example of this we have SmartPCA: https://github.com/chrchang/eigensoft/blob/master/POPGEN/lsqproject.pdf