dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

CLI --ignore-cols ignored on MacOS #2254

Open christopherfowers opened 2 years ago

christopherfowers commented 2 years ago

System Information (please complete the following information):

Describe the bug When using the CLI from terminal to train a model using data from a csv file and specifying columns to ignore using --ignore-cols 1, 2, 3 results in an output model and sample project that does in fact use the columns intended to be ignored as inputs for classification predictions.

To Reproduce Steps to reproduce the behavior:

  1. create a multi-column csv for text classification and call it data.csv. Include columns you don't with to be used in the predictions at all. (fill it with some meaningful data to train classification models.)
  2. open a terminal and navigate to the folder containing the csv.
  3. mlnet classification --dataset "data.csv" --has-header true --train-time 10 --label-col 8 --ignore-cols 1, 2, 3, 4, 5, 6, 7, 9 (obviously this step should include the appropriate label column (0 indexed) and ignore columns (also 0 indexed))

Expected behavior Generated model and sample project should not use columns listed in the --ignore-cols flag arguments.

Actual behavior Each of the ignored columns are still used.

christopherfowers commented 2 years ago

I would like to add to this, that using names instead of index for the columns does not change the outcome. And removing the header line in the csv then switching --has-header to false has only the effect of changing the names applied to the generated model input. Those values are still expected in the input and still used by the model for predictions.

luisquintanilla commented 2 years ago

Hi @christopherfowers thanks for this issue. Since it's related to ML.NET tools, I'm moving it to the dotnet/machinelearning-modelbuilder repo.

Aanchm commented 1 year ago

I am having the same issue as @christopherfowers while trying to train a regression model through the CLI on Windows. Please can you advise on the solution to this problem? Thanks :)