MN.Net Ranking Project - Githubissues

dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.

https://dot.net/ml

MIT License

4.47k stars 2.68k forks source link

MN.Net Ranking Project #648

Open gangasahu opened 5 years ago

gangasahu commented 5 years ago

In the ML.Net Ranking sample project, the consumption of the model is not clear. For example : After the model is trained and saved, it has to be used with the new query to rank the web searched. The example provided is not clear. Suppose, we want to use the web query for "testing tool", how does this has to be passed to the model so that model will return a list of URL_Ids with proper score so that it can be ranked. More challenge is : what happern when the query "?????" has not been queried before or there is no groupId in the test data, how will be ranked. Calling Prediction might return null.

So a concrete examples will be helpful.

May be anothe sample with the Hotel selection using ML.Net will be helpful.

Have been working on this for quite some time. Having trouble figuring out how to consume the model.

Appreciate any help / samples /tips.

Email : ganga.sahu@rrd.com RR Donnelley

nicolehaugen commented 5 years ago

Hi @gangasahu -

I read through your questions and wanted to provide you with the below information. Please let me know if this still doesn't answer your questions.

Using your example where a new query "testing tool" is entered by a user, two things need to happen:

1.) Your app that is consuming the model is first responsible for determining the query results themselves and must group these results with an identifier, known as the group id -- the key here is that this is the responsibility of the app that is consuming the model. For example, your app would need to provide the query results similar to this (note: 'etc' represents additional feature columns that should be included that were used in training the model):

Group Id	Query Result	Etc.
100	Test Tool A	…
100	Test Tool B	…
100	Test Tool C	…

2.) Once the query results are determined and are grouped according to a common group id, the model can then be used to rank these results for the consuming app. Continuing with the "testing tool" example, the data shown in the above table would be passed to the model to have it rank all those results that have the same group id. Here are the lines of code from the sample that would do this step:

// Load the model to perform predictions with it.
DataViewSchema predictionPipelineSchema;
ITransformer predictionPipeline = mlContext.Model.Load(modelPath, out predictionPipelineSchema);

// Predict rankings passing in the data that has the query results grouped by an id.
IDataView predictions = predictionPipeline.Transform(data);

Thanks, Nicole

gangasahu commented 5 years ago

Thanks Nicole for your response and explanation. It is really helpful and now I understand better.

However, I have another question. For the query "Testing tool", the application will frame the list for "Test Tool A", "Test Tool B " from the historical data and then assign the group id of 100, then pass to the model to rank.

But when the query to rank "Testing Tool" is not there in historical data, then how that list will be prepared to pass to the link. In that case, the list passed to the model will be empty and nothing can be ranked.

In my use case fo ranking is: I will have a query that is a combination of Source city and Destination city (ex. "ChicagoNewyork"). What happened if this query string not there in historical / training data. However, there might be queries like "ChicagoHouston", "ChicagoDallas", "BostonNewyork", "MemphisNewyork" etc. are there is the historical / training data. How to prepare the query list data to the model to rank?

For additional input (feature vector) besides label column, the input to the model should include the feature vector for the new query? Right? How about the case when we do not know all the features fo the feature vector for the query?

Basically, the question is: how to rank for a query that is not in the historical / training data?

Appreciate your input on the above questions.

nicolehaugen commented 5 years ago

Hi, @gangasahu -

I should have clarified that in the example I gave in my previous response, I am referring to the case where you have a new query that does not exist in the historical data that is used for training.

When you train the model initially with historical training data, you include the following as parameters: 1.) The column that stores the group id (you can also think of this as a query or search id) 2.) The label column which indicates the rank of the query results with in the group 3.) The feature column(s) that are influential in determining the relevance/rank for a given query result

With this in mind, let's take the example that you mentioned involving Source\Destination city. Let's assume we have some training data for flights that looks like this:

Group Id	Query Result	Source City	Destination City	Label	Departure Time	Arrival Time
100	Flight A	Chicago	Houston	4	6:00 AM	1:00 PM
100	Flight B	Chicago	Houston	2	8:00 AM	2:00 PM
100	Flight C	Chicago	Houston	0	4:00 AM	10:00 PM
101	Flight F	Dallas	New York	4	5:00 AM	1:00 PM
101	Flight G	Dallas	New York	3	10:00 AM	4:00 PM
101	Flight H	Dallas	New York	0	3:00 AM	10:00 PM
101	Flight I	Dallas	New York	0	2:00 AM	11:00 PM

Note that with the above data, suppose that we decide that the feature columns are the arrival and departure time columns. This is because in our example, we find that these column values are most important in determining the ideal rank of the results.

Once the model is trained, let's now assume that a user enters a new query of "Boston to New York". Here's what needs to happen: 1.) The app that is consuming the model, first needs to get the results for this query and will group these results according to a group id. So, for example, our app might query a database to get all flights from Boston to New York for a given date specified by the user. 2.) Once the app has these results, it passes this data to the model to get the predicted rankings. The data passed to the model may look something like this:

Group Id	Query Result	Source City	Destination City	Departure Time	Arrival Time
103	Flight Q	Boston	New York	6:00 AM	1:00 PM
103	Flight R	Boston	New York	8:00 AM	2:00 PM
103	Flight S	Boston	New York	4:00 AM	10:00 PM

When this data is passed to the model, the model will look at the features columns that exist in this data (e.g. Departure\Arrival Time) and then use these as the basis for predicting the ranks for each result.

The key here is that the predicted rankings are based on the feature columns that you decide to select when training your model. These same feature columns must also be present in the data used when making a prediction with the model.

Let me know if you still have questions. ~Nicole

gangasahu commented 5 years ago

Thanks Nicole for taking the time to explain all these that goes behind.

Few Questions :

If the query to get the list of rows with the Source_Destination key returns zero records, nothing can be ranked. In that case, how this model can be used. Can the query be changed to get a list of records with Source_SomeCity or SomeCity_Destination, return the records and then use that for the ranking?
There is a column called weightage or Label that the model will maximize to return the list of records.
The model returns an extra column called Score. The results need to be sorted descending on the score to get the ranking list. I find sometimes, the score returns values like positive (0.8534) and negative (-5.3456). Does this mean with the low negative number that the model does not rank properly (Like in regression, we judge the performance of the model with lowest MAE and higher accuracy) . How do we know that the ranking is working properly or how do we measure it similar to MAE and accuracy in regression and classification models.

Whenever you have time, appreciate if you can provide your input to the above questions.

I am still working on the model to prepare ML data for our use case, very similar to this. I will update you once it is ready.

Thank you.

nicolehaugen commented 5 years ago

Hi, @gangasahu -

Here are answers to your questions above:

1.) Your model's ranking ability is going to depend on the quality of the data that you train it with. Let's assume that you decide to train the model with the following feature columns: Source City, Destination City, Departure Time, Arrival Time. If your query results (e.g. the data you provide to the model to rank) are for a Source City and Destination City that do not exist in the training data, the model will still return ranking based on all of the feature column values that it was trained with. However, you may find that the NDCG (see bullet 3 below) value is low and that the model doesn't rank these results as desired. As a result, you may need to consider expanding your training dataset to include these additional cities.

2.) I'm unsure of the question that you're asking here - the Label column exists in your training data to specify the actual rank that for a set of query results. The weight (also referred to as custom gains) allows you to say how much each label value is worth. For example, if you label your data with the following values: terrible = 0, bad = 1, ok = 2, good = 3, and perfect = 4. And, you decide that you want greater emphasis on perfect results being ranked higher in your search query - you could specify custom gains to be {0, 1, 2, 10, 100} which makes a perfect result 10x more important than a good result and 50x more important than an ok result.

3.) It does not matter if the Scores are sometimes negative - you should sort the scores in descending order as you mentioned. To measure the ranking ability, you should rely on the NDCG value that is returned based on the number of results that you are looking to evaluate. For example, NDCG@10 measures the likelihood that the top 10 results are ranked correctly - this score ranges from 0.0 to 1.0. The closer to 1.0, the better the ranking ability of your model. To continue increasing this score, you would need to experiment with:

Which features you have selected to train your data with.
The amount\quality of the training data that you're training with - for ranking, you will need a lot of training data. The general guidance I've seen is around at least 10,000 results.
Hyperparameters that are used to tune the model.

Hope these answers help - your questions have been helpful to me in that I recognize there are areas in the sample where more detail should be provided. Let me know if you still have questions.

~Nicole

gangasahu commented 4 years ago

Hi Nicole,

Thanks for all your valuable input for my questions. Now, my project is at a stage to use the real data and the ranking ML.Net project is getting ready to be trained, validated and tested with the real data. While I am doing this process, I have the following questions / issues coming up. I hope you can throw some light into it :

1) The Group Id column value is changing after consuming the model and rerank the results. I thought the group id is a id by which we have a list of results with that Id will have to be ranked. I do not understand why the Group Id is changing.

2) After ranking, the Label column is not in order from highest to lowest, even though the model learns from the features. May be the model using the feature vector to rank it and rank need not be in ascending order.

3) For the same data, the scores are coming different on different run. Of course, I am retraining, validating and testing with the same data. That's the reason, the ranking is coming in different sequence in different runs.

4) After training the model with more than 70000 records, the model zip file size is 11KB, which is very small. How it is possible. When I was working on the Azure ML Studio, the iLearner file blob size was very big comparatively.

5) How to try with different hyper parameters in Ranking problem unlike prediction problem where the hyper parameters are decision tree parameters. Is the hyper parameters are like using different set of training, validation data and see which one gives better scores.

Here are the Model output Before Raking and After Ranking. Last column in Red is the column that gets ranked after consuming thru the model. See the group id is changing after the passing thru the model. Before Ranking

After Ranking

CESARDELATORRE commented 4 years ago

Adding @ebarsoumMS and @justinormont from the team to help on the Ranking Model questions above, as well.

gangasahu commented 4 years ago

Hello Nicole, @ebarsoumMS or @justinormont,

Any update on the questions I have asked. I am in the middle of a project and waiting eagerly for your answers.

Thank you.

yaeldekel commented 4 years ago

Hi @gangasahu, I can try to answer your questions. 1,2. The group ID column and the label column are only used at training time (and also for evaluation), but the ranking model does not use the group ID column, or the label column for scoring. It only uses the features column to calculate a numeric score for each example it is given. It is the user's responsibility to keep track of which scored examples come from the same query and to sort the examples based on the scores returned by the model.

I'm not sure I understand the question. If you train a new model using different data then it is expected that the scores will not be the same (you might even get slightly different models if you train on the same data, because LightGBM uses some randomness in training).
The size of the model does not depend on the number of examples it is trained on, but 11kb does seem a bit small - how many trees did you train? And how many features did you train on?

The place where you can do hyper parameter sweeping is when defining the LightGBM trainer here. The signature of this API is:

public static LightGbmRankingTrainer LightGbm(this RankingCatalog.RankingTrainers catalog,
        string labelColumnName = DefaultColumnNames.Label,
        string featureColumnName = DefaultColumnNames.Features,
        string rowGroupColumnName = DefaultColumnNames.GroupId,
        string exampleWeightColumnName = null,
        int? numberOfLeaves = null,
        int? minimumExampleCountPerLeaf = null,
        double? learningRate = null,
        int numberOfIterations = Defaults.NumberOfIterations)

You can also change some other advanced hyper parameters using this API:

public static LightGbmRankingTrainer LightGbm(this RankingCatalog.RankingTrainers catalog,
        LightGbmRankingTrainer.Options options)

Hope this helps, let me know if you have more questions.

gangasahu commented 4 years ago

Hello @yaeldekel,

Thank you very much for your answers. It really clarifies some of my questions. I was busy with other aspects of operationalizing the model. So late in responding. Here are some of my other questions:

Is there any Auto-ML available for ranking projects based on the LightGbm algorithm. Otherwise, I have to train the model in many iterations having different hyperparameter combinations.
How is the accuracy of the model measured in case of ranking projects? It gives only the DCG and NDCG values. How to interpret the accuracy of the model from these values. Here is an output from my model training process. How to interpret this. DCG: @1:12.4085, @2:12.4085, @3:12.4085 NDCG: @1:0.7137, @2:0.7137, @3:0.7137
Why the model is trained on (train + validation + test) dataset. I guess for ML, the model should learn using the training dataset. Then use the validation dataset to tune the model using proper hyperparameter settings. Then use the Test dataset is to have an unbiased evaluation of the model and determine the accuracy. But in the example, the final model (save as ZIP file) sees all the data, which should not be the case in my knowledge. Maybe I am missing something or not understanding. Please explain.
Like for regression and classification ML projects, we have Mean Absolute Error (MAE) and Accuracy that determines the best model using the correct hyperparameters. Model having lead MAE and high accuracy is the best model. Do not know how to do that for ranking projects. I think AutoML for ranking projects would have helped in this respect.

Please advise.

Thank you very much.

justinormont commented 4 years ago

Is there any Auto-ML available for ranking projects based on the LightGbm algorithm. Otherwise, I have to train the model in many iterations having different hyperparameter combinations.

Like for regression and classification ML projects, we have Mean Absolute Error (MAE) and Accuracy that determines the best model using the correct hyperparameters. Model having lead MAE and high accuracy is the best model. Do not know how to do that for ranking projects. I think AutoML for ranking projects would have helped in this respect.

The ranking task is not currently available in AutoML. If you file an issue asking for it, it will bring it to people's attention. /cc @JakeRadMSFT

Why the model is trained on (train + validation + test) dataset. I guess for ML, the model should learn using the training dataset. Then use the validation dataset to tune the model using proper hyperparameter settings. Then use the Test dataset is to have an unbiased evaluation of the model and determine the accuracy. But in the example, the final model (save as ZIP file) sees all the data, which should not be the case in my knowledge. Maybe I am missing something or not understanding. Please explain.

The earlier CV/TrainTest modes are for getting metrics, which estimate how well the model will do in production. The last step is training the model to deploy to production; this final model is trained on all available data. More info: https://github.com/dotnet/machinelearning-samples/pull/549#discussion_r301207345