josephsdavid commented 2 years ago

[x] Gaussian general docstrings
[x] Gaussian example
[x] Multinomial general docstrings
[x] Multinomial example (Simple bag of words???)

josephsdavid commented 2 years ago

Cool thanks for this! I've added a few comments though a more serious review will likely come from the active maintainers.

Thank you for your reviews! 😄

codecov-commenter commented 2 years ago

Codecov Report

Merging #8 (3629791) into master (aa6031f) will increase coverage by 7.27%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master       #8      +/-   ##
==========================================
+ Coverage   74.54%   81.81%   +7.27%     
==========================================
  Files           1        1              
  Lines          55       55              
==========================================
+ Hits           41       45       +4     
+ Misses         14       10       -4

Impacted Files	Coverage Δ
src/MLJNaiveBayesInterface.jl	`81.81% <ø> (+7.27%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update aa6031f...3629791. Read the comment docs.

ablaom commented 2 years ago

@josephsdavid I've not yet reviewed, but here's a suggestion for the multinomial example (it needs #9):

using MLJ
import TextAnalysis

CountTransformer = @load CountTransformer pkg=MLJText
MultinomialNBClassifier = @load MultinomialNBClassifier pkg=NaiveBayes

tokenized_docs = TextAnalysis.tokenize.([
    "I am very mad. You never listen.",
    "You seem to be having trouble? Can I help you?",
    "Our boss is mad at me. I hope he dies.",
    "His boss wants to help me. She is nice.",
    "Thank you for your help. It is nice working with you.",
    "Never do that again! I am so mad. ",
])

sentiment = [
    "negative",
    "positive",
    "negative",
    "positive",
    "positive",
    "negative",
]

mach1 = machine(CountTransformer(), tokenized_docs) |> fit!

# matrix of counts:
X = transform(mach1, tokenized_docs)

# to ensure scitype(y) <: AbstractVector{<:OrderedFactor}:
y = coerce(sentiment, OrderedFactor)

classifier = MultinomialNBClassifier()
mach2 = machine(classifier, X, y)
fit!(mach2, rows=1:4)

# probabilistic predictions:
y_prob = predict(mach2, rows=5:6) # distributions
pdf.(y_prob, "positive") # probabilities for "positive"
log_loss(y_prob, y[5:6])

# point predictions:
yhat = mode.(y_prob) # or `predict_mode(mach2, rows=5:6)`

josephsdavid commented 2 years ago

@josephsdavid I've not yet reviewed, but here's a suggestion for the multinomial example (it needs #9):

for now i just MLJ.tableed the data and it seems to work fine!

ablaom commented 2 years ago

Attention @ablaom

ablaom commented 2 years ago

Thanks @josephsdavid for your contribution. Great to have another out of the way.

JuliaAI / MLJNaiveBayesInterface.jl

Add Docstrings #8

Codecov Report