Open jnothman opened 8 years ago
Definitely agree. What transformers were you thinking of and where would be a good place to start?
I'd consider a feature selector initially, as it should be fairly trivial to just ignore some features in PMML, but correctly interpreting the pipeline object may involve some work. Otherwise, something involving projections like PCA or random projection, although I'm not sure that these are often used with forests.
I think the most natural way to implement pipelines in PMML would be using nested models. That will allow transformers to define their own LocalTransformations, replace the schemas and pass the adjusted context to the nested model.
A PMML representation of basic scikit-learn transformers and transformer pipelines / featureunions is needed for this to be more broadly useful.