Closed zeahmed closed 5 years ago
That is intentional, since we did not seem to have a scenario that required it, and it is a fairly large API to reveal its public surface (on the order of the same size of DataViewSchema
).
The scenario you've outlined, creating a new estimator from scratch, is not something we decided to support for v1. (Even if this issue would be solved, you'd face another problem once you had the job of creating an ITransformer
implementation (assuming you also wanted to implement one of those from scratch), is that while the interface ICanSaveModel
is a public type, and ModelSaveContext
is a public class, no methods on that class are accessible. (This done deliberately.)
This lack of extensibility at the IEstimator
and ITransformer
level is deliberate since these mechanisms (of which SchemaShape
is only one of several, there's as mentioned model saving, model loading, which is a whole other kettle of fish) is not something we had time to refine prior to v1.
You could I guess build an IEstimator
that returns some existing ITransformer
instance rather than write one of those from scratch, but then in such a case as that you wouldn't need to create novel transform estimator instances anyway.
I wonder if the custom transformer, which is our current "extensibility" story, works for whatever you have in mind?
No, the custom transformer does not work for our story. We are specifically looking to extend ML.NET for third-party customers to easily incorporate a new piece of functionality into their pipelines. We want to be able to tell people to just place the new Estimator/Transformer into their pipeline and have it work.
@jeremytieman, With our new type register system, the custom transformer can handle aribitrary C# classes now. Please take a look at this example.
Since we intentionally hide SchemaShape.Column
, I think this issue can be closed. Please feel free to reopen it if you still have concerns.
From this discussion, I don't understand how I can use GetOutputSchema
?
Let's say I want to convert all int
columns to single
, for example.
If I'm not in the first estimator, I'm required to remember all the transformations I concatenated, to remember which columns are currently int
.
Or I can get the outputSchema
and choose the relevant columns from it.
I suggest at least make the following method public:
IEstimator
is a public interface that allows to create new estimators in ML.NET using NuGet only. The interface is public which defines two methodsFit
andGetOutputSchema
.https://github.com/dotnet/machinelearning/blob/1724da898d8cff88543fcf9e7b356ef0989b7bf7/src/Microsoft.ML.Core/Data/IEstimator.cs#L301
The
GetOutputSchema
method takesSchemaShape
as input. The only way to createSchemaShape
is through its constructor that takesSchemaShape.Column
as input.https://github.com/dotnet/machinelearning/blob/1724da898d8cff88543fcf9e7b356ef0989b7bf7/src/Microsoft.ML.Core/Data/IEstimator.cs#L129
However, the only constructor of
SchemaShape.Column
is internal which blocks external user to create new estimators.https://github.com/dotnet/machinelearning/blob/1724da898d8cff88543fcf9e7b356ef0989b7bf7/src/Microsoft.ML.Core/Data/IEstimator.cs#L62
How can I create a new estimator using NuGet only?