Closed audy closed 10 years ago
That's awesome, thanks! Will look deeper into it shortly
On Mon, Jun 9, 2014 at 11:39 AM, Austin Richardson <notifications@github.com
wrote:
You've implemented your own k-mer counting.
scikit-learn has a great k-mer counting algorithm called the hashing vectorizer: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html
— Reply to this email directly or view it on GitHub https://github.com/biocore/yolo-hipster/issues/2.
Closing b/c of #3
You've implemented your own k-mer counting.
scikit-learn has a great k-mer counting algorithm called the hashing vectorizer: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.HashingVectorizer.html
See also: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
EDIT: I implemented a Naive Bayesian Classifier using SciKit-Learn for a presentation not long ago: https://github.com/audy/presentations/blob/master/02-26-2014-scikit_learn_for_biology/ipython-notebooks/16S%20rRNA%20Classifier%20(Text%20Based).ipynb (you might find this useful).