cadmiumcr / cadmium

Natural Language Processing (NLP) library for Crystal
https://cadmiumcr.com
MIT License
205 stars 15 forks source link

Fix negator token lookup on sentiment analysis #27

Open hugoabonizio opened 5 years ago

hugoabonizio commented 5 years ago

The line item = -item if NEGATORS.includes?(prev_token) was checking through a Hash, thus it wasn't matching the negator tokens, I changed it to a Set instead.

Is this the proper repo or should I send it to https://github.com/cadmiumcr/sentiment?

watzon commented 4 years ago

Could we get the spec passing on this? Looks like a method is returning a Int32 | Float64 when it should only be returning a Float64

hugoabonizio commented 4 years ago

It seems to be the same issue as #26, fortunately it was easy to solve! Now it's compiling, but the specs are still failing due to some wrong expecations (maybe related to the / -> // issue in newer Crystal?).

watzon commented 4 years ago

Hmm there's some weird things happening in the failing specs. Certain numbers are completely off now.

hugoabonizio commented 4 years ago

@watzon it's working again! :tada:

hugoabonizio commented 4 years ago

Added explicit return type to avoid the following warning:

In src/cadmium/summarizer/luhn_summarizer.cr:39:17

 39 | private def select_sentences(text, max_num_sentences, normalized_terms_ratio)
                  ^---------------
Warning: this method overrides Cadmium::Summarizer#select_sentences(text : String, max_num_sentences : Int32, normalized_terms_ratio : Hash(String, Float64)) which has an explicit return type of Array(String).

Please add an explicit return type (Array(String) or a subtype of it) to this method as well.

The above warning will become an error in a future Crystal version.

Related crystal-lang/crystal#8232

watzon commented 4 years ago

I just realized this is in the main repo. Sorry I didn't catch it earlier, but PRs should ideally be made to the repo specific to the module that you're modifying. This repo is going to end up just importing all the other repos. So would you be able to make PRs to the proper repos?

hugoabonizio commented 4 years ago

So the specific repo is cadmiumcr/sentiment? I'm having a bad time figuring out where things are located in this small packages setup :smile:, e.g., where are stop words located?

watzon commented 4 years ago

Yep, and stop_words are in i18n :) feel free to come to the gitter if you need help with anything