Closed patrick-miller closed 7 years ago
In https://github.com/cognoma/machine-learning/pull/67, @joshlevy89 ended up exploring different ways of applying tranformations on a subset of features (e.g. PCA on expression features only). See https://github.com/cognoma/machine-learning/pull/67#discussion_r86557384. It looks like we ended up removing all solutions from the PR. However, there were at least 2 options we discussed:
@joshlevy89 do you remember what the issues were / have any advice? Perhaps some of the problems we ran into have been fixed.
Great, thanks for the pointer, I missed the comment on my first pass. I have opened a PR (#100) that uses FeatureUnion
, and it seems to work correctly.
It's been awhile since I've worked with these modules so I don't remember specifics beyond what I wrote in the PR. I think the key points from that were: 1) DataFrameMapper was more concise but was an extra dep plus was relatively new (leading to some weird behaviors) 2) FeatureUnion worked in many situations, but note from that PR: "It should be possible to apply Imputer to only covariates. But for some reason this is not working with FeatureUnion. It's as if FunctionTransformer were were working for SelectKBest but not Imputer"
It's possible things have changed since then or that it works great for your use-case. Hope that helps.
We have two sources of features: the covariates and the gene expression matrix. When pre-processing this data, we generally want to perform dimensionality reduction only on the expression matrix.
This can prove cumbersome when trying to implement PCA in a pipeline. Scikit-learn does not provide this sort of functionality out of the box, but I believe it is possible to continue using pipelines. Here is an example.