fabianvf / python-rake

MIT License
130 stars 35 forks source link

Badly need tests #18

Closed fabianvf closed 6 years ago

fabianvf commented 7 years ago

Not even fancy ones, just something that runs at least a happy path with a small amount of text to make sure everything is valid.

jkterry1 commented 7 years ago

just something that runs at least a happy path with a small amount of text to make sure everything is valid.

Explain please? I have lots of samples, I'm just unclear on what you want.

fabianvf commented 7 years ago

Just something we can run when we're doing builds on travis that will run the code and verify the return, just so that if we have broken imports or something we won't be able to push it to pypi.

fabianvf commented 7 years ago

so like test.py

import RAKE
# Do RAKE stuff
# verify RAKE stuff worked

Then we'd just add python test.py to the travis build.

jkterry1 commented 7 years ago

It seems to me that all we need is the things to do the described behavior in the readme.md , and that needs a list of stop words, a space separated .txt file of stop words*, and a maybe a few lists of trial text. I'll be back in a minute with all those the things attached.

*(by the way- we need to add .csv support, if you can figure out the most pythonic way to do it (maybe a function flag with a default?) I'll add it myself after the dust has settled)

jkterry1 commented 7 years ago

Test String (I like this because it has funny edge cases): 'Machine learning is the subfield of computer science that, according to Arthur Samuel in 1959, gives "computers the ability to learn without being explicitly programmed."[1] Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,[2] machine learning explores the study and construction of algorithms that can learn from and make predictions on data[3] – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,[4]:2 through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible; example applications include email filtering, detection of network intruders or malicious insiders working towards a data breach,[5] optical character recognition (OCR),[6] learning to rank, and computer vision.

Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining,[7] where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning.[4]:vii[8] Machine learning can also be unsupervised[9] and be used to learn and establish baseline behavioral profiles for various entities[10] and then used to find meaningful anomalies.

Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.[11]

As of 2016, machine learning is a buzzword, and according to the Gartner hype cycle of 2016, at its peak of inflated expectations.[12] Effective machine learning is difficult because finding patterns is hard and often not enough training data is available; as a result, machine-learning programs often fail to deliver.[13][14]'

Here's a meh stopwords .txt that isn't built in stopwords list.txt

Here's the same thing as a python list: https://pastebin.com/6j76y30j

fabianvf commented 7 years ago

This is great, I'll work on adding some tests this week.

jkterry1 commented 7 years ago

With the new regex list handling, you may want to try giving the .txt an empty line and saving the internals of it as a python list as a .csv, possibly with missing commas or something, to make sure the regex doesn't malfunction,

Also did you get a chance to take a look at this?

jkterry1 commented 7 years ago

@fabianvf get a chance to add the tests?

jkterry1 commented 7 years ago

@fabianvf anything new and interesting?

fabianvf commented 7 years ago

@justinkterry not yet, as usual it's hard to prioritize tests over features :P. I'll try to find some time to do this, but work projects have been keeping me plenty busy the last few weeks.

jkterry1 commented 7 years ago

@fabianvf anything new?

jkterry1 commented 6 years ago

@fabianvf I see some tests are added now. Should this be closed...?

fabianvf commented 6 years ago

yeah, at least those tests should check that the project is importable/instantiatable