Closed sfilipi closed 5 years ago
My surface reactions.
Regarding the namespaces that are reserved mostly for what we think of as classical "training" aalgorithms, we tended to put them in Microsoft.ML.Trainers
, from what I see. (E.g.: Microsoft.ML.Trainers.FastTree
).
Microsoft.ML.UniversalModelFormat.Onnx
should become Microsoft.ML.Model.Onnx
I think.
Since Microsoft.ML.StaticPipe
stuff now resides in a separate nuget, it might be appropriate to move it to the Microsoft.ML
namespace. However this is of lower priority since that associated nuget will not be stable for v1 I expect. (Might still be nice, since that should be easy to do I think.)
I am not sure that I see the point in the seemingly elaborate hierarchy of Transforms namespaces (e.g., categorical, conversions, feature-selection, normalizers), but I do not feel strongly about it. The negative thing I see about this is that the inevitable result is that the "odd-ball" transformers that are in fact less commonly used wind up being uncategorizable and so put in the regular Transforms
namespace. But I do not feel strongly about this.
My expectation is that anything with EntryPoints
in the name except for the thing for running/executing entry-points would stay. E.g., Microsoft.ML.EntryPoints
would be public, but anything else would be internal. So that renders what I am about to say less urgent, but I would still do it: I do not, for example, see the point in there being a Microsoft.ML.Ensemble.EntryPoints
namespace. (We do not elsewhere have the static classes where entry-point definitions reside, reside in a separate namespace, any more than we have the command-line invocation logic live in a separate namespace, etc. etc.)
Microsoft.ML.Trainers.FastTree.Internal
should just become the regular namespace. If something really is internal, the classes and types should just be internal. Nothing to do with the namespace.
I do not understand why we have "Microsoft.ML.TimeSeries" and "Microsoft.ML.TimeSeriesProcessing". Anyway, it seems like these belong in one of either the "Transforms" or "Trainers" style namespace, probably Transforms. (So Microsoft.ML.Transforms.TimeSeries.)
Microsoft.ML.SamplesUtils contains a single class. I feel like these things, if we want to keep them, should be extensions on DataOperationsCatalog
. I also don't understand what return value being a string means. This whole API is just problematic, not just namespace choice.
To be consistent with trainers, all the stuff in Microsoft.ML.Ensemble
should be moved to Microsoft.ML.Trainers.Ensemble
. This includes all those strange subnamespaces that seem to exist purely for the sake of communicating, "hey, look at me, the types in me have a different type hierarchy than other types elsewhere, isn't that super cool?" Which is a fairly silly use for namespaces. 😛
Microsoft.ML.Tools
, from what I see this should not be (and as far as I can tell doesn't) contain public classes that ship as part of any nuget.
Microsoft.ML.Numeric
seems to contain purely stuff used for LBFGS training. I guess i don't see the harm in having it have its own namespace maybe, but it should all be rendered internal, since the guts of our optimizer is hardly part of our public surface.
Microsoft.ML.Learners
, should be moved to some appropriate choice of "trainers." Probably the same with so-called "InternalLearn".
Microsoft.ML.Internal.Internallearn.ResultProcessor
. Goofy. Nothing in result processor lives in any public surface shipping as part of a nuget, so might as well give it the namespace Microsoft.ML.ResultProcessor
.
Microsoft.ML.Calibrator
, whatever happened to the idea of making calibration another data transform step?
Note that while namespace changes are fine, I feel like the first order of business really must be to choose what we put in Microsoft.ML
root namespace itself. There are lots of things that are definitely in this category and already in there (MLContext
and extension methods), but lots of things also that are not yet there that probably should be (e.g., ITrainerEstimator
, IEstimator
, ITransformer
, and friends).
We have both Microsoft.ML.Trainers and using Microsoft.ML.Learners namespaces. I think in the blogs/announcements we refer to them mostly as Learners. Drop the Microsoft.ML.Trainers in favor of Microsoft.ML.Learners?
We have both Microsoft.ML.Trainers and using Microsoft.ML.Learners namespaces. I think in the blogs/announcements we refer to them mostly as Learners. Drop the Microsoft.ML.Trainers in favor of Microsoft.ML.Learners?
Converging on Microsoft.ML.Trainers instead, since that seems to be the namespace the majority of trainers live in.
cc @CESARDELATORRE @JRAlexander @luisquintanilla
I'd prefer Microsoft.ML.Algorithms.
It would match https://dotnet.microsoft.com/learn/machinelearning-ai/what-is-mldotnet. "Choose Algorithm"
I'd prefer Microsoft.ML.Algorithms.
I would not. We have trainers, transforms, predictors... if some random note written by someone happened to use such a generic term as "algorithm," and it is apparently having the ability to affect how people think about such things, that is what must be changed.
Regarding Microsoft.ML.Data.IO, I think everything here should be hidden, it can probably be moved into Microsoft.ML.Data
if we care. I personally might keep it where it is though, since it's very, very specific to binary loaders/savers.
Ok, then what about Microsoft.ML.LearningAlgorithms? Again it would match the terms we are already using on the marketing site... and are industry standard...
Choose Algorithm Choose the learning algorithm that will provide the highest accuracy for your scenario. ML.NET offers the following types of learners:
Linear (e.g. SymSGD, SDCA) Boosted Trees (e.g. FastTree, LightGBM) K-Means SVM Averaged Perceptron
FYI -
ML.Trainers
and ML.Training
namespaces. These should be consolidated. -- #2713 has been logged to track this.Microsoft.ML.Internal.Calibration
and Microsoft.ML.Internal.Internallearn
, which should be hidden/moved/renamed. -- #2714 has been logged to track this.namespace Microsoft.ML.Data.Evaluators.Metrics {
public sealed class AnomalyDetectionMetrics {
public double Auc { get; }
public double DrAtK { get; }
}
}
namespace Microsoft.ML.Model {
public interface ICanSaveModel {
void Save(ModelSaveContext ctx);
}
public sealed class ModelSaveContext : IDisposable {
public void Dispose();
}
}
This isn't enough types (IMO) for it's own namespace.
In the table below there are all the namespaces that are in ML.Net. They also all display in the docs site. Some of them, like Microsoft.ML.Ensemble needs to continue to exist in the code, but it is not ready to be exposed to the users, and can be hidden in the docs site.
Some others, like Microsoft.ML.Internal.Internallearn can potentially be merged into other namespaces. Let's annotate in the list below the namespaces that need to be hidden in the docs site, and the ones that need to be gone altogether: