What namespaces to keep, what to drop, and what to hide

sfilipi commented 5 years ago

In the table below there are all the namespaces that are in ML.Net. They also all display in the docs site. Some of them, like Microsoft.ML.Ensemble needs to continue to exist in the code, but it is not ready to be exposed to the users, and can be hidden in the docs site.

Some others, like Microsoft.ML.Internal.Internallearn can potentially be merged into other namespaces. Let's annotate in the list below the namespaces that need to be hidden in the docs site, and the ones that need to be gone altogether:

Namespace	Drop it from code	Hide it from docs	Ships In Nuget
Microsoft.ML			Microsoft.ML
Microsoft.ML.Calibrator			Microsoft.ML
Microsoft.ML.Core.Data	drop
Microsoft.ML.Data			Microsoft.ML
Microsoft.ML.Data.IO	drop?
Microsoft.ML.Data.IO.Zlib	drop
Microsoft.ML.Ensemble	Microsoft.ML.Trainers.Ensemble	hide	None
Microsoft.ML.Ensemble.EntryPoints			Microsoft.ML
Microsoft.ML.Ensemble.OutputCombiners	Microsoft.ML.Trainers.Ensemble	hide	None
Microsoft.ML.Ensemble.Selector	Microsoft.ML.Trainers.Ensemble	hide	None
Microsoft.ML.Ensemble.Selector.DiversityMeasure	Microsoft.ML.Trainers.Ensemble	hide	None
Microsoft.ML.EntryPoints			Microsoft.ML
Microsoft.ML.EntryPoints.JsonUtils	drop?
Microsoft.ML.FactorizationMachine	Microsoft.ML.Trainers.FactorizationMachine		Microsoft.ML
Microsoft.ML.ImageAnalytics			Microsoft.ML.ImageAnalytics
Microsoft.ML.ImageAnalytics.EntryPoints	Microsoft.ML.ImageAnalytics
Microsoft.ML.Internal.Calibration	Microsoft.Ml.Calibrator	hide	Microsoft.ML
Microsoft.ML.Internal.CpuMath		hide	Microsoft.ML
Microsoft.ML.Internal.Internallearn	drop?
Microsoft.ML.Internal.Internallearn.ResultProcessor	drop?		None
Microsoft.ML.Learners	Microsoft.ML.Trainers		Microsoft.ML
Microsoft.ML.LightGBM	Microsoft.ML.Trainers.LightGBM		Microsoft.ML.LightGBM
Microsoft.ML.Model			Microsoft.ML.Onnx , Microsoft.ML
Microsoft.ML.Model.Onnx			Microsoft.ML.Onnx
Microsoft.ML.Numeric	drop?
Microsoft.ML.SamplesUtils			Microsoft.ML
Microsoft.ML.StaticPipe			Microsoft.ML.StaticPipe
Microsoft.ML.StaticPipe.Runtime	drop?
Microsoft.ML.Sweeper		hide	Microsoft.ML
Microsoft.ML.Sweeper.Algorithms		hide	Microsoft.ML
Microsoft.ML.TimeSeries			Microsoft.ML.TimeSeries
Microsoft.ML.TimeSeriesProcessing	drop?
Microsoft.ML.Tools		hide?
Microsoft.ML.Trainers			Microsoft.ML
Microsoft.ML.Trainers.FastTree			Microsoft.ML
Microsoft.ML.Trainers.FastTree.Internal	Microsoft.ML.Trainers.FastTree
Microsoft.ML.Trainers.HalLearners		Microsoft.ML.Trainers.HalLearners
Microsoft.ML.Trainers.KMeans			Microsoft.ML
Microsoft.ML.Trainers.Online			Microsoft.ML
Microsoft.ML.Trainers.PCA			Microsoft.ML
Microsoft.ML.Trainers.Recommender
Microsoft.ML.Trainers.SymSgd	Microsoft.ML.Trainers.HalLearners
Microsoft.ML.Training	drop
Microsoft.ML.Transforms
Microsoft.ML.Transforms.Categorical
Microsoft.ML.Transforms.Conversions
Microsoft.ML.Transforms.FeatureSelection
Microsoft.ML.Transforms.Normalizers
Microsoft.ML.Transforms.Projections
Microsoft.ML.Transforms.TensorFlow
Microsoft.ML.Transforms.Text
Microsoft.ML.UniversalModelFormat.Onnx	Microsoft.ML.Model.Onnx

TomFinley commented 5 years ago

My surface reactions.

Regarding the namespaces that are reserved mostly for what we think of as classical "training" aalgorithms, we tended to put them in Microsoft.ML.Trainers, from what I see. (E.g.: Microsoft.ML.Trainers.FastTree).

Microsoft.ML.UniversalModelFormat.Onnx should become Microsoft.ML.Model.Onnx I think.

Since Microsoft.ML.StaticPipe stuff now resides in a separate nuget, it might be appropriate to move it to the Microsoft.ML namespace. However this is of lower priority since that associated nuget will not be stable for v1 I expect. (Might still be nice, since that should be easy to do I think.)

I am not sure that I see the point in the seemingly elaborate hierarchy of Transforms namespaces (e.g., categorical, conversions, feature-selection, normalizers), but I do not feel strongly about it. The negative thing I see about this is that the inevitable result is that the "odd-ball" transformers that are in fact less commonly used wind up being uncategorizable and so put in the regular Transforms namespace. But I do not feel strongly about this.

My expectation is that anything with EntryPoints in the name except for the thing for running/executing entry-points would stay. E.g., Microsoft.ML.EntryPoints would be public, but anything else would be internal. So that renders what I am about to say less urgent, but I would still do it: I do not, for example, see the point in there being a Microsoft.ML.Ensemble.EntryPoints namespace. (We do not elsewhere have the static classes where entry-point definitions reside, reside in a separate namespace, any more than we have the command-line invocation logic live in a separate namespace, etc. etc.)

Microsoft.ML.Trainers.FastTree.Internal should just become the regular namespace. If something really is internal, the classes and types should just be internal. Nothing to do with the namespace.

I do not understand why we have "Microsoft.ML.TimeSeries" and "Microsoft.ML.TimeSeriesProcessing". Anyway, it seems like these belong in one of either the "Transforms" or "Trainers" style namespace, probably Transforms. (So Microsoft.ML.Transforms.TimeSeries.)

Microsoft.ML.SamplesUtils contains a single class. I feel like these things, if we want to keep them, should be extensions on DataOperationsCatalog. I also don't understand what return value being a string means. This whole API is just problematic, not just namespace choice.

To be consistent with trainers, all the stuff in Microsoft.ML.Ensemble should be moved to Microsoft.ML.Trainers.Ensemble. This includes all those strange subnamespaces that seem to exist purely for the sake of communicating, "hey, look at me, the types in me have a different type hierarchy than other types elsewhere, isn't that super cool?" Which is a fairly silly use for namespaces. 😛

Microsoft.ML.Tools, from what I see this should not be (and as far as I can tell doesn't) contain public classes that ship as part of any nuget.

Microsoft.ML.Numeric seems to contain purely stuff used for LBFGS training. I guess i don't see the harm in having it have its own namespace maybe, but it should all be rendered internal, since the guts of our optimizer is hardly part of our public surface.

Microsoft.ML.Learners, should be moved to some appropriate choice of "trainers." Probably the same with so-called "InternalLearn".

Microsoft.ML.Internal.Internallearn.ResultProcessor. Goofy. Nothing in result processor lives in any public surface shipping as part of a nuget, so might as well give it the namespace Microsoft.ML.ResultProcessor.

Microsoft.ML.Calibrator, whatever happened to the idea of making calibration another data transform step?

TomFinley commented 5 years ago

Note that while namespace changes are fine, I feel like the first order of business really must be to choose what we put in Microsoft.ML root namespace itself. There are lots of things that are definitely in this category and already in there (MLContext and extension methods), but lots of things also that are not yet there that probably should be (e.g., ITrainerEstimator, IEstimator, ITransformer, and friends).

sfilipi commented 5 years ago

We have both Microsoft.ML.Trainers and using Microsoft.ML.Learners namespaces. I think in the blogs/announcements we refer to them mostly as Learners. Drop the Microsoft.ML.Trainers in favor of Microsoft.ML.Learners?

sfilipi commented 5 years ago

      We have both Microsoft.ML.Trainers and using Microsoft.ML.Learners namespaces. I think in the blogs/announcements we refer to them mostly as Learners.  Drop the Microsoft.ML.Trainers in favor of Microsoft.ML.Learners?

Converging on Microsoft.ML.Trainers instead, since that seems to be the namespace the majority of trainers live in.

cc @CESARDELATORRE @JRAlexander @luisquintanilla

JRAlexander commented 5 years ago

I'd prefer Microsoft.ML.Algorithms.

JRAlexander commented 5 years ago

It would match https://dotnet.microsoft.com/learn/machinelearning-ai/what-is-mldotnet. "Choose Algorithm"

TomFinley commented 5 years ago

I'd prefer Microsoft.ML.Algorithms.

I would not. We have trainers, transforms, predictors... if some random note written by someone happened to use such a generic term as "algorithm," and it is apparently having the ability to affect how people think about such things, that is what must be changed.

TomFinley commented 5 years ago

Regarding Microsoft.ML.Data.IO, I think everything here should be hidden, it can probably be moved into Microsoft.ML.Data if we care. I personally might keep it where it is though, since it's very, very specific to binary loaders/savers.

JRAlexander commented 5 years ago

Ok, then what about Microsoft.ML.LearningAlgorithms? Again it would match the terms we are already using on the marketing site... and are industry standard...

Choose Algorithm Choose the learning algorithm that will provide the highest accuracy for your scenario. ML.NET offers the following types of learners:

Linear (e.g. SymSGD, SDCA) Boosted Trees (e.g. FastTree, LightGBM) K-Means SVM Averaged Perceptron

eerhardt commented 5 years ago

FYI -

We still have public types in ML.Trainers and ML.Training namespaces. These should be consolidated. -- #2713 has been logged to track this.
We still have Microsoft.ML.Internal.Calibration and Microsoft.ML.Internal.Internallearn, which should be hidden/moved/renamed. -- #2714 has been logged to track this.
we also have 1 type in:

namespace Microsoft.ML.Data.Evaluators.Metrics {
    public sealed class AnomalyDetectionMetrics {
        public double Auc { get; }
        public double DrAtK { get; }
    }
}

we also have 2 public types in:

namespace Microsoft.ML.Model {
public interface ICanSaveModel {
    void Save(ModelSaveContext ctx);
}
public sealed class ModelSaveContext : IDisposable {
    public void Dispose();
}
}

This isn't enough types (IMO) for it's own namespace.

dotnet / machinelearning

What namespaces to keep, what to drop, and what to hide #2326