Prashant-Jonny / accord

Automatically exported from code.google.com/p/accord
0 stars 0 forks source link

IndexOutOfRangeException in NaiveBayse Computer Function #16

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
Issue linked from StackOverflow post:
http://stackoverflow.com/questions/13052167/using-accord-nets-codification-objec
t-to-codify-second-data-set/

What is the expected output? What do you see instead?
Expected Result is "lorem"

Please provide any additional information below.

Attached is a source code file which reproduces the issue. 

It is entirely possible I am doing something wrong.

Original issue reported on code.google.com by pburr...@gmail.com on 25 Oct 2012 at 2:02

Attachments:

GoogleCodeExporter commented 8 years ago
Hmmm from the code you sent, I suspect what you are looking for is to use a 
Bag-of-words. Is this the case? The Codification filter is more appropriate to 
process data tables, and you seem to be wanting to perform text classification. 

Currently there is no class to extract a BoW in the current version of the 
framework. However, this class was already in the works to be included in the 
next release of the framework, and so I happen to have a partial but working 
implementation here. I am sending it attached, together with an example. Please 
see if it works for you.

A final version will be available in the next version of the framework.

Original comment by cesarso...@gmail.com on 25 Oct 2012 at 4:39

Attachments:

GoogleCodeExporter commented 8 years ago
Awesome. Thanks. A Bag o' Words is exactly what I was trying to do. This worked 
great.

Original comment by pburr...@gmail.com on 26 Oct 2012 at 6:41

GoogleCodeExporter commented 8 years ago
In the BagOfWords class, I added some error handling in GetFeatureVector for 
words that were not found in the trained data:

        /// <summary>
        ///   Gets the codeword representation of a given text.
        /// </summary>
        /// 
        /// <param name="image">The text to be processed.</param>
        /// 
        /// <returns>An integer vector with the same length as words
        /// in the code book.</returns>
        /// 
        public int[] GetFeatureVector(params string[] text)
        {
            int[] features = new int[NumberOfWords];

            // Detect all activation centroids
            for (int i = 0; i < text.Length; i++)
            {
                try
                {
                    int j = stringToCode[text[i]];
                    features[j]++;

                    if (features[j] > MaximumOccurance)
                        features[j] = MaximumOccurance;
                }
                catch (KeyNotFoundException knfe)
                {
                    //eat the KeyNotFoundException. This word simply isn't in the training data.
                }
            }

            return features;
        }

Original comment by pburr...@gmail.com on 26 Oct 2012 at 7:59

GoogleCodeExporter commented 8 years ago
Bag-of-Words has been incorporated on Accord.NET 2.8.

Original comment by cesarso...@gmail.com on 6 Nov 2012 at 5:24