dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.98k stars 1.88k forks source link

Support lazy loading MLImage when training. #6474

Open vpenades opened 1 year ago

vpenades commented 1 year ago

Is your feature request related to a problem? Please describe. When training with images, it is required to feed a large collection of images into a column dataset. This is typically done by setting the file path to each image, which is late loaded when needed.

But this is, assuming the image is stored in the hard drive, and not somewhere else, or if it requires some custom transformation, or it doesn't exist at all and it's procedurally generated.

For such case scenarios the proposed solution is to use custom transformers. I've tried them and I've found them needlessly complex and hard to understand, most probably due to lack of documentation and proper examples dealing with images.

Describe the solution you'd like I think a simpler solution would be to support Lazy<MLImage> (or some kind of factory interface, or a Func) as a DataSet column. that would greatly simplify development and would not require the use of custom transformers.

Describe alternatives you've considered Using custom transformers, which is ugly.

Additional context Another point of concern is that MLImage is a disposable object, I have no clue at which point ML disposes of the images already used for training (or if it ever does it) or who's responsible of disposing, ML or the developer. Certainly having datatype columns that require to be disposed makes things a lot more complicated that they should be, I believe MLImage should have been made non disposable in the first place.

So maybe by supporting this kind of late image creation/loading and disposing, the memory management of images would be easier to understand.

Futhermore, by allowing MLImage to be late loaded, it also opens the possibility to the developer to preprocess the image in any way it seems fit, using any image processing available, and not limited to the image transformations provided by ML. In this case, it could be interesting to have an interface or a function that would pass the expected input image size and format.

michaelgsharp commented 1 year ago

@luisquintanilla I think something like this would be good to look into. It has come up before and while there are workarounds as @vpenades said either with the filepaths or using the Custom Transformers, the CustomTransformers can be kinda ugly. I think the hardpart would be if we did use a factory/Func, would that be different per each row? Or would the factory/Func need to be able to work for all rows? Anyways, something to consider.

@vpenades Have you tried using a URI for the image filepath? I thought we supported that (though I haven't checked in a while so I could be thinking of something else).

michaelgsharp commented 1 year ago

@tarekgh are you familiar with more of the internal image lifetime and when it gets disposed?

tarekgh commented 1 year ago

are you familiar with more of the internal image lifetime and when it gets disposed?

The internal image lifetime starts from creating the image object till the image object gets disposed using MLImage.Dospose() or GC collected.

vpenades commented 1 year ago

@michaelgsharp Among other things, something I want to do is this: train a model with images rendered at runtime from a 3D model.

If I had a lazy For each row I would do something like this:

row.ImageSource = new Lazy( ()=> renderImage(512,512, camera) );

or as a function:

row.ImageSource = (width, height) => RenderImage(width,height, camera);

Also, I want to load images directly from Zip archives, so the images are pulled from the zip on demand.

@tarekgh , yes, MLImage is disposed using .Dispose() but... when it is called within ML during training?

If a developer fills a column with images, I would say it's the developer the one responsible of disposing them. But if the column is filled with file paths (or lambdas), then it's ML itself that loads the images into memory, and I would expect ML to dispose them.

Images should not be left for the GC to be disposed because Skiasharp can potentially use unmanaged memory for storing the image, and leaving undisposed would mean a memory leak.

That's why I think that MLImage, instead of being a thin layer over skiasharp, should copy the pixels to a managed array and get rid of the skiasharp object at the first opportunity. So the MLImage would not need to be disposed, and the GC would handle the managed array.

tarekgh commented 1 year ago

That's why I think that MLImage, instead of being a thin layer over skiasharp, should copy the pixels to a managed array and get rid of the skiasharp object at the first opportunity. So the MLImage would not need to be disposed, and the GC would handle the managed array.

This will create the performance hit for the allocation and copying time. Also, this will complicate the image handling very dramatically except if we pin the array buffer which will put a lot of pressure in the GC too. I don't think we should do that.

We should find the proper places that ML is not going to use the image anymore and then dispose it.

luisquintanilla commented 1 year ago

But this is, assuming the image is stored in the hard drive, and not somewhere else, or if it requires some custom transformation, or it doesn't exist at all and it's procedurally generated.

My understanding is you can also load images from Streams.

Here is a sample which does that. While this sample loads from a directory stored on disk you could imagine the byte[] coming from anywhere else.

I think that might help address at least some of the issues mentioned on this thread. That being said, are there scenarios I've missed where Lazy<MLImage> would be a good alternative / solution to address those problems?

vpenades commented 1 year ago

@luisquintanilla The problem is not loading a single image, but to create a training dataset with 200.000 rows and one of the columns is multiple images. Certaily you can't load 200.000 images into memory at the same time.

So far, ML resolved this by allowing File Paths instead of images to be fed to the dataset column, so the images are loaded on demand, but this is very limiting because it assumes the images source is a file system.

That being said, are there scenarios I've missed where Lazy<MLImage> would be a good alternative / solution to address those problems?

Yes, here's some examples:

And yes, it could be possible to prepare all the input images into a temporary directory, and then simply pass the paths to the dataset column, but it's a big waste of disk space and time. In particular, with procedurally generated images, it could be possible to generate millions of images.

luisquintanilla commented 1 year ago

@vpenades thanks for that information.

Would you be able to provide a sample code snippet of how you're currently solving this problem?

vpenades commented 1 year ago

Imagine I want to train an OCR recognizer with a large number of characters, I would need a large number of small images with characters, so:


class FontSample
{
  Char character;
  FontFamily font;
  Color foreground;
  Color background;
  Flags flags;

  // render this sample to a bitmap image
  public MLImage RenderImage(int width, int height)
  {
   var glyph = fontFamily.GetGlyph( character );
   return DrawCharacterToMLImage(width, height, glyph, foreground, background, flags);
  }

  // create millions of FontSamples
  public  static IEnumerable<FontSample> GetAllSamples()
  {
    foreach( fontFamily in Windows.SystemFonts)
    {
      foreach( char in fontFamily)
      {
        foreach( backColor in AllColors)
        {
          foreach( foreColor in AllColors)
          {
            if (backColor == foreColor) continue;

            yield return new FontSample(char, fontFamily, backColor, foreColor, normal);
            yield return new FontSample(char, fontFamily, backColor, foreColor, italic);
            yield return new FontSample(char, fontFamily, backColor, foreColor, bold);
            yield return new FontSample(char, fontFamily, backColor, foreColor, italic | bold);

            // I could add even more permutations here: different font sizes, slight rotations, add some noise dirt, etc
          }
        }
      }
    }
  }
}

Given this snippet, and the number of fonts included in windows plus all the thousands combinations of colors per character would probably lead to millions of images.

Right now, the way to process such scenario is:

// a "data set" where the label is the character and the column value is the path to the image in disk
Dictionary<char, imageFilePath> DataSet;

// render all the characters to disk (millions of them)
foreach(sample in FontSample.GetAllSamples())
{
  var image = sample.RenderImage(64, 64); // render the image beforehand

  var path = GetNewImagePath(); // save it to disk
  image.SaveTo(path);

  DataSet[ sample.character ] = path;
}

// at this point I have a directory with millions of image files, from where images will be loaded on demand

TrainModel( DataSet );

Now, if Lazy or any variation like Func<int width, int height, MLImage> were supported , I would be able to do this:

// a "data set" where the label is the character and the column is a Lazy image that will be *CREATED* on demand.
Dictionary<char, Lazy<MLImage>> DataSet;

// Prepare the DataSet
foreach(sample in FontSample.GetAllSamples())
{
  DataSet[ sample.character ] = new Lazy<MLImage>( () => sample.RenderImage(64, 64) );
}

TrainModel( DataSet );  // in here, instead of loading images from disk, they're rendered on demand as the Lazy<> column is evaluated.

So the difference is a huge resource saving, and a performance boost because we don't go through the hard drive's bottleneck, also there's no image loading/saving anywhere, the images are always created, consumed and disposed in memory.

Now, I don't care how the mechanism to generate the images on demand is, it can be any of this:

luisquintanilla commented 1 year ago

@vpenades thanks for your suggestion. I've added this to our backlog.

We haven't seen many requests for this, so in the meantime we'll continue to look for similar requests and listen to feedback.

If this is a significant blocker for you, we'd be happy to take contributions and work with you to enable it in ML.NET.