Transformers and transformer pipelines

alex-pirozhenko / sklearn-pmml

A library that allows serialization of SciKit-Learn estimators into PMML

MIT License

70 stars 17 forks source link

Transformers and transformer pipelines #17

Open jnothman opened 8 years ago

jnothman commented 8 years ago

A PMML representation of basic scikit-learn transformers and transformer pipelines / featureunions is needed for this to be more broadly useful.

NeverNude commented 8 years ago

Definitely agree. What transformers were you thinking of and where would be a good place to start?

jnothman commented 8 years ago

I'd consider a feature selector initially, as it should be fairly trivial to just ignore some features in PMML, but correctly interpreting the pipeline object may involve some work. Otherwise, something involving projections like PCA or random projection, although I'm not sure that these are often used with forests.

alex-pirozhenko commented 8 years ago

I think the most natural way to implement pipelines in PMML would be using nested models. That will allow transformers to define their own LocalTransformations, replace the schemas and pass the adjusted context to the nested model.