ashkonf / HybridNaiveBayes

A generalized implementation of the Naive Bayes classifier in Python.
MIT License
25 stars 4 forks source link

Issue with "collections" library #5

Open jayneeee opened 7 years ago

jayneeee commented 7 years ago

image

ashkonf commented 7 years ago

I wasn't able to reproduce this error. The collections library is imported in the file so there shouldn't be an import error, and the line indicated in the traceback isn't calling collections. Let me know if you have more details about the nature of the error.

jayneeee commented 7 years ago

Hi, I just solved the issue by changing "import collections" to "from collections import Counter". However I have another question which is regarding the input data format. isit have to be matrix?

ashkonf commented 7 years ago

Data points are expected to be represented via sparse encodings. They are assumed to be dictionaries, where keys (any hashable data type) are feature names and the values are the feature values. The training data matrix is expected to be a list of such dictionaries.

jayneeee commented 7 years ago

Hi ashkonf,

Thanks for your prompt reply. One more question, for the multinomial distribution NB, what are the differences comparing to sklearn multinomial NB? From what I understand, sklearn takes in any discrete number as input. Besides, do you add Laplace smoothing in your NB classifier?

Thanks Jieyan

On Jul 19, 2017, at 12:19 PM, Ashkon Farhangi notifications@github.com<mailto:notifications@github.com> wrote:

Data points are expected to be represented via sparse encodings. They are assumed to be dictionaries, where keys (any hashable data type) are feature names and the values are the feature values. The training data matrix is expected to be a list of such dictionaries.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ashkonf/HybridNaiveBayes/issues/5#issuecomment-316268276, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVEV_BEx_FSuVibIVkcs4zxIuhsal8mAks5sPYOvgaJpZM4OVF7C.

ashkonf commented 7 years ago

The primary differentiators between this implementation and the Scikit Learn implementation are the following:

  1. Support for both categorical and ordered features.
  2. Support for both discrete and continuous ordered features.
  3. Support for modeling ordered features using arbitrary probability distributions.

The Scikit Learn implementation of the Multinomial Naive Bayes model doesn't support continuous features, and models discrete features as unordered categorical variables rather than numerically ordered variables.

This implementation does support Laplace smoothing. See the smoothingFactor parameter of the Multinomial distribution class.

jayneeee commented 7 years ago

Hi,

Thanks again for your explaination. However, I have problem converting my data into the re


From: Ashkon Farhangi notifications@github.com Sent: Thursday, July 20, 2017 11:50 AM To: ashkonf/HybridNaiveBayes Cc: jayneeee; Author Subject: Re: [ashkonf/HybridNaiveBayes] Issue with "collections" library (#5)

The primary differentiators between this implementation and the Scikit Learn implementation are the following:

  1. Support for both categorical and ordered features.
  2. Support for both discrete and continuous ordered features.
  3. Support for modeling ordered features using arbitrary probability distributions.

The Scikit Learn implementation of the Multinomial Naive Bayes model doesn't support continuous features, and models discrete features as unordered categorical variables rather than numerically ordered variables.

This implementation does support Laplace smoothing. See the smoothingFactor parameter of the Multinomial distribution class.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ashkonf/HybridNaiveBayes/issues/5#issuecomment-316589065, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVEV_EGyAJH-rqWoMx4Qj9fL3SYoESYDks5sPs6DgaJpZM4OVF7C.

jayneeee commented 7 years ago

Hi,

Thanks again for your explanation. However, I have issues converting my data into the required format. I'm new in this area, I've deal with dictionary before but may i know how should I convert my csv data/ dataframe data into the format for the code?

thanks.


From: jieyan lai jieyan_lai@hotmail.com Sent: Thursday, July 27, 2017 1:03 PM To: ashkonf/HybridNaiveBayes Subject: Re: [ashkonf/HybridNaiveBayes] Issue with "collections" library (#5)

Hi,

Thanks again for your explaination. However, I have problem converting my data into the re


From: Ashkon Farhangi notifications@github.com Sent: Thursday, July 20, 2017 11:50 AM To: ashkonf/HybridNaiveBayes Cc: jayneeee; Author Subject: Re: [ashkonf/HybridNaiveBayes] Issue with "collections" library (#5)

The primary differentiators between this implementation and the Scikit Learn implementation are the following:

  1. Support for both categorical and ordered features.
  2. Support for both discrete and continuous ordered features.
  3. Support for modeling ordered features using arbitrary probability distributions.

The Scikit Learn implementation of the Multinomial Naive Bayes model doesn't support continuous features, and models discrete features as unordered categorical variables rather than numerically ordered variables.

This implementation does support Laplace smoothing. See the smoothingFactor parameter of the Multinomial distribution class.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ashkonf/HybridNaiveBayes/issues/5#issuecomment-316589065, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVEV_EGyAJH-rqWoMx4Qj9fL3SYoESYDks5sPs6DgaJpZM4OVF7C.

ashkonf commented 7 years ago

If your data is in matrix format (which data in a CSV file would be), there is a fairly easy way to transform it into dictionary format. Simply assign each column of your matrix a name. Then create one dictionary per row of your matrix, assigning the row's values to keys corresponding to column names. The list of such dictionaries will be your sparse matrix representation.