dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.01k stars 1.88k forks source link

'Ignore' not respected in schema #1640

Closed daholste closed 5 years ago

daholste commented 5 years ago

System information

Issue

Contents of automl_.graph.json:

{
  "Inputs": {
    "file_train": "D:\\SplitDatasets\\ExcitementFG2_train.csv",
    "file_test": "D:\\SplitDatasets\\ExcitementFG2_valid.csv"
  },
  "Nodes": [
    {
      "Inputs": {
        "CustomSchema": "sep=, col=Label:R4:0 col=Features:R4:1-13 col=Cat:TX:14 col=Cat01:TX:15 col=Ignore:TX:16,25 col=Cat02:TX:17 col=Cat03:TX:18 col=Cat04:TX:19 col=Cat05:TX:20 col=Cat06:TX:21 col=Cat07:TX:22 col=Cat08:TX:23 col=Cat09:TX:24 col=Cat10:TX:26 col=Cat11:TX:27 col=Cat12:TX:28 col=Cat13:TX:29 col=Cat14:TX:30 col=Cat15:TX:31 col=Cat16:TX:32 col=Cat17:TX:33 col=Cat18:TX:34 col=Cat19:TX:35 col=Cat20:TX:36 col=Cat21:TX:37 col=Cat22:TX:38 col=Cat23:TX:39",
        "InputFile": "$file_train"
      },
      "Name": "Data.CustomTextLoader",
      "Outputs": {
        "Data": "$data_train"
      }
    },
    {
      "Inputs": {
        "CustomSchema": "sep=, col=Label:R4:0 col=Features:R4:1-13 col=Cat:TX:14 col=Cat01:TX:15 col=Ignore:TX:16,25 col=Cat02:TX:17 col=Cat03:TX:18 col=Cat04:TX:19 col=Cat05:TX:20 col=Cat06:TX:21 col=Cat07:TX:22 col=Cat08:TX:23 col=Cat09:TX:24 col=Cat10:TX:26 col=Cat11:TX:27 col=Cat12:TX:28 col=Cat13:TX:29 col=Cat14:TX:30 col=Cat15:TX:31 col=Cat16:TX:32 col=Cat17:TX:33 col=Cat18:TX:34 col=Cat19:TX:35 col=Cat20:TX:36 col=Cat21:TX:37 col=Cat22:TX:38 col=Cat23:TX:39",
        "InputFile": "$file_test"
      },
      "Name": "Data.CustomTextLoader",
      "Outputs": {
        "Data": "$data_test"
      }
    },
    {
      "Inputs": {
        "BatchSize": 3,
        "StateArguments": {
          "Name": "AutoMlState",
          "Settings": {
            "Engine": {
              "Name": "Rocket",
              "Settings": {}
            },
            "Metric": "Accuracy",
            "TerminatorArgs": {
              "Name": "IterationLimited",
              "Settings": {
                "FinalHistoryLength": 100
              }
            },
            "TrainerKind": "SignatureBinaryClassifierTrainer"
          }
        },
        "TestingData": "$data_test",
        "TrainingData": "$data_train"
      },
      "Name": "Models.PipelineSweeper",
      "Outputs": {
        "Results": "$output_data",
        "State": "$xyz"
      }
    }
  ],
  "Outputs": {
    "output_data": "C:\\Benchmarking\\01-ResultsOut.csv"
  }
}
najeeb-kazmi commented 5 years ago

You can simply remove col=Ignore:TX:16,25 from your schema. This will not load those columns in the first place.

daholste commented 5 years ago

I did try that, but got an exception when trying to load col 17 (after first ignored col, ie w/o first loading/mentioning col 16)

rogancarr commented 5 years ago

Closing: Microsoft.ML.PipelineInference has been removed from the repository.