dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
263 stars 56 forks source link

Support custom data sources #2174

Open vpenades opened 2 years ago

vpenades commented 2 years ago

I want to use image classification, but I have images in heterogeneous locations, and some need some kind of custom pre-processing, so the current image data sources is very limiting

Also, there's some other scenarios I'm facing which might require the images to feed to ML will need some sort of preprocessing. It's a big waste to have to write them to temporary files.

"DataSource": {
    "Type": "Folder",
    "Version": 1,
    "FolderPath": "c:\\Images"
  },

But I would like to do something like this:

"DataSource": {
    "Type": "Class",
    "Version": 1,
    "ClassName": "CustomImageImporter"
  },
class CustomImageImporter : IImageClassifierImporter
{
    public IEnumerable<(string Label, Byte[] Image)> EnumerateLabeledInputImages()
    {
        yield return ("cat", LoadBytes("cat.png"));
        yield return ("dog", LoadBytes("dog.png"));
    }
}

The only alternative I have right now is not using AutoML and writing the whole thing in ML

beccamc commented 2 years ago

@LittleLittleCloud Can you provide the code sample for streaming images? A notebook or code first approach here is probably the way to go.

vpenades commented 2 years ago

@beccamc Ah, sorry, forgot about this issue... I was able to use the custom image loading with help from a similar issue, but I agree it's anything from trivial, so complete examples would be desirable.

I am using a custom loader because I have a wrapped structure so I can do things like loading the same image and flip it horizontally to double the number of input samples... or running a detector to crop sections of an image (for example, for face analysis)

I think this issue could be closed, though