accord-net / framework

Machine learning, computer vision, statistics and general scientific computing for .NET
http://accord-framework.net
GNU Lesser General Public License v2.1
4.48k stars 1.99k forks source link

Issue using visual bag of words with large images #769

Closed JakeTrans closed 6 years ago

JakeTrans commented 7 years ago

I have been using the Visual Bag of Words to identify different types of standard scanned documents (the goal being to sort the 4 different with the possibility of using this classification to look for specific data within these document.

I have found an issue when doing the classification on large pictures (about 4032 x 3024), the numbers involved overflow the GetRectangle function, I have look at the source code and changed the integers to 64 bit integer however these increased the memory usage by an extreme amount. Downscaling the image to smaller dimension does also fix the problem.

I will continue to test this however would there be a more efficient/accurate way of doing this I’m conscious I’m new to Machine Learning and may be using the wrong tool to do this To replicate the issue take the Visual Bag of word example and expand one of the picture to 4032 x 3024 and attempt the compute the bag of words this will cause the error in question.

Thanks,

cesarsouza commented 7 years ago

Hi @JakeTrans,

Thanks a lot for opening this issue. I would say that changing the GetRectangle method to use Int64 instead of Int32 would actually be the correct route here. The problem we might have to address then is how to deal with the increased memory requirements of your large images.

When using Bag-of-(Visual)-Words, it is not actually necessary to use all feature points to learn the clustering algorithm, or even to create the final representation of the image. I would say that the code needs to be updated to consider only a subsample of all features instead of all of them, thus reducing the amount of data you would need to keep in memory.

As such, if you have already gone through the work of updating the code, you could try altering this line I am linking here and which I am also reproducing below:

     descriptors[i] = detector.ProcessImage(x[i]).Select(p => p.Descriptor).ToArray();

And update it to return not all descriptors, but only some of them, let's say randomly sampling 1000 descriptors maximum per image (you can also make this quantity configurable by adding it as a property of the BaseBagOfVisualWords class). You can use the Sample method for that to help with the sampling.

This will cause the vocabularies to be created using only a subset of the descriptors and hopefully decrease the memory requirements of the problem you are trying to tackle.

Please let me know if it helps!

JakeTrans commented 7 years ago

Hi @cesarsouza

Thank you very much for your response, I will look into this at let you know the results.

JakeTrans commented 7 years ago

I have converted the GetRectangle over to Int64 and also a number of related methods and with a computer with more memory (the computer I have been working with only had 4GB) I been able to sucessfully train the Bag of Words without needing to sample the classifiers and make a small improvement in the memory handling. I will look into the sampling of the descriptions separately as this could be useful anyway in my project.

The small improvement I made was to add the below overload to the learn function in BaseBagOfVisualWords:

      public TModel Learn(string[] x, double[] weights = null)
        {
            // Note: See note in the method below
            var descriptors = new TFeature[x.Length][];
            // For all images
            For(0, x.Length, (i, detector) =>
            {
                Bitmap ImageToLearn = (Bitmap)Bitmap.FromFile(x[i]);
                // Compute the feature points
                descriptors[i] = detector.ProcessImage(ImageToLearn).Select(p => p.Descriptor).ToArray();
                ImageToLearn.Dispose();
            });
            return Learn(descriptors.Concatenate());
        }

As this overload accepts the filepath as the array rather then the images themselves and loads the image on-demand so to speak this does cut down the requirements in my case the Bitmap list use around 5-6GB of RAM at peak and this version took around 4-5GB at peak.

I will look further into the sampling but please let me know if this would be of any use to you.

Thank you for your help

JakeTrans

cesarsouza commented 7 years ago

Hi @JakeTrans,

Thanks a lot, it is also nice to know this strategy worked well for you. I guess that since this method is doing lots of IO it would also be a good candidate to be written using Async as mentioned in #635. I might have to think a little bit about what would be the best interface/API to offer here, but I think this could indeed be a nice addition to the framework! Thanks a lot!

(By the way - if anyone reading this would like to work on this issue and implement it while using async also please let me know and feel free to submit a PR!)

Regards, Cesar

JakeTrans commented 7 years ago

Hi @cesarsouza

Thank you for the response, please forgive me but this is the first project I've raised a issue with or modified, but I would like to be sure. would you like me to submit a pull request with my changes? at lease as a stopgap until #635 has been worked on.

Thank you,

Jake

cesarsouza commented 7 years ago

Well, I was thinking about waiting a little bit until I could decide what would be the best way to expose this feature (there might be other places that might also benefit from having images represented by filenames rather than Bitmaps). However, if you are willing to submit a PR with your changes, please go ahead and I can figure it out later 😄

Thanks a lot!

Regards, Cesar

cesarsouza commented 6 years ago

Support for computing bag of visual words from file names, as well as options for sampling feature descriptors and images from the training set, has been added in 3.8.0.