shmoradims commented 6 years ago

Status

Folder	Sample	Data Science Review	API Review
C#\getting_started	BinaryClassification_ CreditCardFraudDetection	OK (#96)	OK (v0.8)
C#\getting_started	BinaryClassification_ SentimentAnalysis	OK	OK (v0.8)
C#\getting_started	Clustering_ CustomerSegmentation	OK (#95)	OK (v0.8)
C#\getting_started	Clustering_ Iris	OK (#109)	OK (v0.8)
C#\getting_started	MulticlassClassification_ Iris	OK	OK (v0.8)
C#\getting_started	Regression_ BikeSharingDemand	OK	OK (v0.8)
C#\getting_started	Regression_ TaxiFarePrediction	OK (#95)	OK (v0.8)
C#\getting_started	DeepLearning ImageClassification TensorFlow	Pixel data preprocessing needed or not? Also, info/github page for the used inception model should be included	OK (v0.8) #155
C#\getting_started	DeepLearning_ TensorFlowEstimator	Same as above	Still 0.7
C#\getting_started	MatrixFactorization_ MovieRecommendation	OK	Still 0.7
C#\end-to-end-apps	MulticlassClassification- GitHubLabeler	OK (#96)	OK (v0.8)
C#\end-to-end-apps	Recommendation- MovieRecommender	OK	Still 0.7
C#\end-to-end-apps	Regression- SalesForecast	OK	OK (v0.8)
C#\getting_started	AnomalyDetection- Sales	OK	OK (v0.11)

DS Review

BinaryClassification_CreditCardFraudDetection

Data preprocessing: OK (MeanVar normalization)
Feature engineering: Not needed (input data are PCA dimensions)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics: Ok
- Accuracy: 99.9%
- Auc: 97.5%
- F1Score: 77.7%

BinaryClassification_SentimentAnalysis

Data preprocessing: Not needed
Feature engineering: Ok (Text -> Feature vector using TextTransform)
Learner: Ok
Training: Ok (single training on training data)
Scoring: Ok
Metrics: Ok
- Accuracy: 72%
- Auc: 97%
- F1Score: 78%

Clustering_CustomerSegmentation

Data preprocessing: Ok (Join and pivot tables using Linq)
Feature engineering: Ok (Counting offers per customers)
Learner: Ok
Training: Why PcaEstimator seed is 42 (magic number)?
Scoring: Ok
Metrics: Ok
- AvgMinScore: 2.3
- Dbi: 2.5

Clustering_Iris

Data loading: Ok (Can load all the numeric values as vector instead of individually and then concatenating them. But keeping it as is for educational purposes.)
Data preprocessing: Ok (not needed)
Feature engineering: Ok (not needed)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics:
- AvgMinScore: 0.564
- DBI: 0.955

MulticlassClassification_Iris

Data loading: Ok (Can load all the numeric values as vector instead of individually and then concatenating them. But keeping it as is for educational purposes.)
Data preprocessing: Ok (not needed)
Feature engineering: Ok (not needed)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics: Ok (Accuracy is 1 because of small test set)

Regression_BikeSharingDemand

Ok.

Regression_TaxiFarePrediction

Data loading: Ok.
Data preprocessing: Ok (not needed)
Feature engineering: Ok (Categorical transform for text columns)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics: LossFn needs to be removed from outputted metrics. It's the same as L2 b/c no custom loss function is defined.
- R2 Score: 0.7
- RMS loss: 5.97
- Absolute loss: .99

MatrixFactorization_MovieRecommendation

MF using MFTrainer. Evaluation done as regressions.

MulticlassClassification-GitHubLabeler

Data loading: Ok.
Data preprocessing: Ok (not needed)
Feature engineering: Ok (Categorical transform for text columns)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics: Ok
- MicroAcuracy (Avg): 71.8%
- MacroAccuracy (Avg): 50.3%
- LogLoss: 1.076
- LogLossReduction: 56.2

Regression-SalesForecast (eShopDashboardML)

Data loading: Ok.
Data preprocessing: Ok (not needed)
Feature engineering: Ok (Categorical transform for text columns)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics: Ok
- Product model:
- L1 Loss: 96.5
- L2 Loss: 74493.7
- RMS: 96.5
- R-squared: 56.6%
- Country model:
- L1 Loss: 0.446
- L2 Loss: 0.386
- RMS: 0.446
- R-squared: 45.3%

AnomalyDetection-Sales

Data loading: Ok.
Data preprocessing: Ok (not needed)
Learner: Ok
Training: Ok
Scoring: Ok
Metrics: there is no evaluation in timeseries- spike detection and change point detection algorithm

justinormont commented 6 years ago

One topic of note: we should be pushing a full retrain after your CV/TrainTest.

The full retraining on the entire dataset gives you a better model. The CV/TrainTest gives your pipeline's metrics { accuracy, AUC, NDCG, etc }. The full retraining, on 100% of the dataset, is the model to launch in production.

Hence we should form our examples as:

Input dataset (where is my data)
Loader function (define columns)
Feature engineering (process raw input data)
Learner (define my pipeline's model type & hyperparameters)
TrainTest / CV (get metrics for my pipeline)
(Look at metrics)
(Iterate to improve metrics -- goto step 3 or 4)
Retrain a model for production on 100% of data
Productionize model (point user to samples on how to host trained models in an App / web service)

CESARDELATORRE commented 6 years ago

@justinormont - Good point. You mean then that in the cases where we have two datasets (training dataset and test dataset) we should also have the full-dataset (merging both) and show how you also must do a last train on 100% of data once you are okay with the metrics, right?

But... since that is the last part in a process to be done after the iterations trying to improve the metrics, do you think we should implement that code in the samples or only explain that process in the guidance?

We can certainly implement it in the sample and when you are still iterating you'd simply comment the code for the last train?

Thoughts?

shmoradims commented 6 years ago

Cesar, I know what Justin is talking about. I noticed this issue too, where CV models are used as the final model, instead of doing step 8 above. I know how to fix it.

CESARDELATORRE commented 6 years ago

Sure, I understand the issue. I'm just saying that if you are still iterating, the sample code shouldn't run the code to train with the full dataset at the end of the process, that's why you might want to comment that last execution until you want.. Let's chat about it offline. 👍

shmoradims commented 6 years ago

@justinormont, I have the following suggestions:

1) For cases where we have only one dataset, we can stick to your 9-step plan above. Do CV for evaluation and tuning, then train the final model on the one dataset, which is the full dataset.

2) I suggest we do not push for full training if the dataset already comes as separate train and test sets, like how most Kaggle and public datasets are. For competitions, they combine train and test sets, b/c there's another private test set for the final evaluation. So I think, for our samples, we should skip step #8. Mixing train and test sets should only be reserved for data scientists that know what they're doing and in cases when there's a second test set for final evaluation. I suggest keeping samples' ML at 100-level to match our audience, or we risk confusing them.

CESARDELATORRE commented 6 years ago

@shmoradims - I agree. Another variation between #1 and #2 if having a single dataset is to split in 80%-20% in memory the original full dataset and then train with the 80% DataView and Test the 20% Dataview. This is also simple to do and requires significant less time than CV.

shmoradims commented 6 years ago

Yes, instead of CV, we can split the data to 80% train, 20% test ourselves, and not mix it back again.

CESARDELATORRE commented 6 years ago

I removed the "Titanic" sample review because there are issues with its DataSet in regards licensing, as I was told by LCA, so we're removing this sample which at the end of the day was not very practical or enterprise oriented.

kunjee17 commented 6 years ago

@shmoradims is it possible to add F# in mix as well. Both are similar only but having Ok around it will surely help. F# do have little different style from coding point of view, so reviewer can validate that as well. cc/ @CESARDELATORRE @dsyme

prathyusha12345 commented 5 years ago

@shmoradims - Cesar told me to ask you if you can do a quick review of the current samples already migrated to 0.11. From data science, algorithms, etc. it should be pretty similar than when we had it in 0.8.

However, a few metrics are not good enough like in Sentiment Analysis (should be higher) and Iris classification (it is actually 1, like overfitting, should be lower). Could you review it, when possible, please?

PeterPann23 commented 5 years ago

Hi, I noticed that Normalization has been removed from the samples… perhaps one should explain the reasoning for this.

CESARDELATORRE commented 5 years ago

@PeterPann23 - What specific sample had normalization removed?

PeterPann23 commented 5 years ago

Have a look at the v1.0.0-preview-All-Samples and search for the API... I do not find much direct use. Bike-sharing mentions it in comments only.

CESARDELATORRE commented 5 years ago

@PeterPann23 - Normalization can be applied depending on each specific sample. Sometimes it makes sense, sometimes it doesn't, that's why I'm asking about what specific sample you think it should have normalization for certain columns?

PeterPann23 commented 5 years ago

I guess if not needed one should definite specify it in the samples. I was under the Assumtion it was always needed for the runtime data to match the static file.

dotnet / machinelearning-samples

Review samples for correct data science approach and ML.NET API usage #81

Status

DS Review

BinaryClassification_CreditCardFraudDetection

BinaryClassification_SentimentAnalysis

Clustering_CustomerSegmentation

Clustering_Iris

MulticlassClassification_Iris

Regression_BikeSharingDemand

Regression_TaxiFarePrediction

MatrixFactorization_MovieRecommendation

MulticlassClassification-GitHubLabeler

Regression-SalesForecast (eShopDashboardML)

AnomalyDetection-Sales