Closed JulioAlbinatiCortez closed 4 years ago
The way to pass in arrays to the ml.net command line is to repeat the option multiple times, the same way you are passing in col
to the TextLoader
. So for the CustomGains
, you would do it like this:
tr=LightGBMRanking{iter=500 gains=0 gains=82 gains=189 gains=435 gains=1000}
I have tested and confirmed that it is working, so I will close this for now. If you have any further issues please feel free to reopen the ticket. Thanks!
It worked! Thanks for the support :)
System information
Issue
What did you do? I am using the command line interface to train a LightGbmRanking model using a pre-defined set of custom gains. Full command is: dotnet /mlnet/MML.dll TrainTest tr=LightGBMRanking{iter=500 customGains="0,82,189,435,1000"} loader=TextLoader{col=SessionGuid:TX:0 col=Features:R4:5-47 col=Label:R4:225} xf=HashTransform{col=GroupId:SessionGuid} data=inputs/train.tsv test=inputs/test.tsv out=outputs/model.zip dout=outputs/pred.tsv maml.exe TrainTest test=inputs/test.tsv tr=LightGBMRanking{iter=500 customGains="0,82,189,435,1000"} dout=outputs/pred.tsv loader=TextLoader{col=SessionGuid:TX:0 col=Features:R4:5-47 col=Label:R4:225} data=inputs/train.tsv out=outputs/model.zip xf=HashTransform{col=GroupId:SessionGuid}
What happened? Command fails suggesting my custom gains are invalid. Full output: _'0,82,189,435,1000' is not a valid value for the 'customGains' command line option Usage For 'LightGBMRanking': customGains=
An array of gains associated to each relevance label. Default value:'0, 3,
7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095' (short form gains)
sigmoid=
Parameter for the sigmoid function. Default value:'0.5'
evaluationMetric=[None|Default|MeanAveragedPrecision|NormalizedDiscountedCumulativeGain]
Evaluation metrics. Default value:'NormalizedDiscountedCumulativeGain'
(short form em)
numberOfIterations=
Number of iterations. Default value:'100' (short form iter)
learningRate=
Shrinkage rate for trees, used to prevent over-fitting. Range: (0,1].
(short form lr)
numberOfLeaves=
Maximum leaves for trees. (short form nl)
minimumExampleCountPerLeaf=
Minimum number of instances needed in a child. (short form mil)
maximumBinCountPerFeature=
Maximum number of bucket bin for features. Default value:'255' (short form
mb)
booster={}
Which booster to use, can be gbtree, gblinear or dart. gbtree and dart use
tree based model while gblinear uses linear function. Default value:'gbdt'
verbose=[+|-]
Verbose Default value:'-' (short form v)
silent=[+|-]
Printing running messages. Default value:'+'
numberOfThreads=
Number of parallel threads used to run LightGBM. (short form nt)
earlyStoppingRound=
Rounds of early stopping, 0 will disable it. Default value:'0' (short form
es)
useCategoricalSplit=[+|-]
Enable categorical split or not. (short form cat)
handleMissingValue=[+|-]
Enable special handling of missing value or not. Default value:'+' (short
form hmv)
useZeroAsMissingValue=[+|-]
Enable usage of zero (0) as missing value. Default value:'-' (short form
uzam)
minimumExampleCountPerGroup=
Minimum number of instances per categorical group. Default value:'100'
(short form mdpg)
maximumCategoricalSplitPointCount=
Max number of categorical thresholds. Default value:'32' (short form
maxcat)
categoricalSmoothing=
Lapalace smooth term in categorical feature spilt. Avoid the bias of small
categories. Default value:'10'
l2CategoricalRegularization=
L2 Regularization for categorical split. Default value:'10'
seed=
Sets the random seed for LightGBM to use.
parallelTrainer={}
Parallel LightGBM Learning Algorithm Default value:'Single' (short form
parag)
@
Read response file for more options
Error log has been saved to '/tmp/TLC/Error_20200826_065457be780834-d247-476d-bb18-e336a332d1eb.log'. Please refer to https://aka.ms/MLNetIssue if you need assistance.
What did you expect? Custom gains provided are a list of integers, as suggested in the input. Not clear on what is the expected input pattern here beyond that.
Source code / logs
Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.