dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Error after removing one or more training images and attempting to re-train #2046

Closed mlaboss closed 1 year ago

mlaboss commented 2 years ago

Model Builder Version: 16.13.1.2210302 Visual Studio Version: 2022 (17.1.0)

Bug description After removing some training images from an already-trained image classification model, attempting to re-train the model results in a "filePath cannot be null or empty" error.

Steps to Reproduce

  1. Create an image classification model, feed it some training data and Train it.
  2. Delete one or more images from the training data
  3. Go back to the Train section of the model builder and hit "Train again"

Expected Experience Re-training the model works as normal.

Actual Experience A Model Builder Error dialog pops up. Screenshot 2022-02-24 100440

Log file attached: LiquidIn24WellPlateModel-OKTSP1.txt

Additional Context It seems like the model builder is looking for the removed file and having a problem when it can't find it.

I am able to work around the problem by creating a brand new folder with a different name than the old folder, putting the training images in there, and pointing the model builder at that folder instead.

joesatriani10 commented 2 years ago

I'm getting exactly the same since last week, but at this point I cant train at all, no matter if I created a new project, that was solution for me (when removing data) if you want to try. image

you can also check the .mbconfig file in your project folder (default: MLModel1.mbconfig), in the projects that worked before was like this:

{
  "TrainingTime": 2147482,
  "Scenario": "ImageClassification",
  "DataSource": {
    "Type": "Folder",
    "Version": 1,
    "FolderPath": "C:\\Users\\areal\\source\\repos\\CLAIM"
  },
  "Environment": {
    "Type": "LocalGPU",
    "Version": 1
  },
  "RunHistory": {
    "Version": 1,
    "Type": "Result",
    "Trials": [
      {
        "Version": 0,
        "Type": "Trial",
        "TrainerName": "DNN + ResNet50",
        "Score": 0.99864406779661019,
        "RuntimeInSeconds": 448.02398681640625
      }
    ],
    "Pipeline": {
      "parameter": {
        "0": {
          "OutputColumnName": "Label",
          "InputColumnName": "Label"
        },
        "1": {
          "LabelColumnName": "Label",
          "ScoreColumnName": "Score",
          "FeatureColumnName": "ImageSource"
        },
        "2": {
          "OutputColumnName": "PredictedLabel",
          "InputColumnName": "PredictedLabel"
        }
      },
      "estimators": [
        "MapValueToKey",
        "ImageClassificationMulti",
        "MapKeyToValue"
      ]
    },
    "MetricName": "MicroAccuracy"
  },
  "Type": "TrainingConfig",
  "Version": 2
}

but now Im always getting the following no matter what I do (even on new projects):

{
  "Type": "TrainingConfig",
  "Version": 2
}

POSTDATA: Im using 2 categories, each with 40K .png images.

beccamc commented 2 years ago

@JakeRadMSFT Can you take a look at this?

JakeRadMSFT commented 2 years ago

@mlaboss - if you just re-select the folder or select a different folder and then re-select original folder. Does that resolve the issue?

Another potential work around -

Does closing Model Builder and Re-opening Model Builder resolve this issue?

I'm going to try these work around and test out a potential fix.

JakeRadMSFT commented 2 years ago

We should try to find a better approach for handling folder/file changes.

JakeRadMSFT commented 2 years ago

Related to: #2081

mlaboss commented 2 years ago

@mlaboss - if you just re-select the folder or select a different folder and then re-select original folder. Does that resolve the issue?

Another potential work around -

Does closing Model Builder and Re-opening Model Builder resolve this issue?

I'm going to try these work around and test out a potential fix.

Selecting a different folder then re-selecting the original folder: Does not resolve the issue, still get the error. I did get a "Changing the folder path will reset training progress, and you will have to re-start training. Would you like to continue?" which I clicked "Yes" on.

Closing Model Builder and re-opening: Does not resolve the issue either.

gsuberland commented 2 years ago

Also hitting this issue. I've been battling it for 5 or 6 hours now and I can't find a way to work around it. I've cleared temp files (both user and system), reinstalled VS and the extensions, switched between VS2019 and VS2022, rebooted, deleted the models and started over, moved the directory, tried rolling back extension versions, etc. and no matter what I do it appears to be stuck with this exact same exception every time I try to train a model. This is a sev0 blocker IMO.

beccamc commented 2 years ago

@gsuberland Did you do the exact same thing as the other user? Removed some files after training? Or are you just getting a similar message?

gsuberland commented 2 years ago

@beccamc The problem arose after I moved the training data to a different directory. The displayed error and stack trace are identical to what was reported with this issue, but I now suspect that it's a different underlying bug. I've opened up a new issue at #2102 in relation.

The error message is definitely a UX bug, since it doesn't actually tell you what the problem was beyond "something went wrong in the training process", and that isn't immediately clear from the exception message or the stack trace.

truebigsand commented 2 years ago

Same as @gsuberland. The problem arose every time when I edit the training data. It can be fixed by renaming of the training data folder and reselecting it on the "Data" page.

luisquintanilla commented 2 years ago

@JakeRadMSFT to take a look and get into May release.

beccamc commented 2 years ago

Action plan:

beccamc commented 1 year ago

I have confirmed that Jake's fix of the reload button works. This will ship in the Fall Release.

Image

beccamc commented 1 year ago

This fix has shipped! Please install the latest version of Model Builder 16.14.0 to see the change.

When you update image data you can now hit the "refresh" button to manually regenerate the appropriate information for training. Let us know how it goes!