Open wasade opened 10 years ago
@wasade, looks great. All very minor comments, feel free to ignore any for now.
Only slightly kind of totally missing unit tests...
@audy, did you hear back from scikit-learn about the performance issues by chance?
Ready for re-review?
Yup, unless we should get specific tests in at this time... On Sep 18, 2014 5:10 PM, "Greg Caporaso" notifications@github.com wrote:
Ready for re-review?
— Reply to this email directly or view it on GitHub https://github.com/biocore/taxster/pull/5#issuecomment-56103737.
What do you think - should we get the code in and then start the tests, or hold off until the tests are in?
On Thu, Sep 18, 2014 at 2:12 PM, Daniel McDonald notifications@github.com wrote:
Yup, unless we should get specific tests in at this time... On Sep 18, 2014 5:10 PM, "Greg Caporaso" notifications@github.com wrote:
Ready for re-review?
Reply to this email directly or view it on GitHub https://github.com/biocore/taxster/pull/5#issuecomment-56103737.
Reply to this email directly or view it on GitHub https://github.com/biocore/taxster/pull/5#issuecomment-56104016.
Made one other suggestion @wasade.
Merged in @audy's pass, and put some structure in place for the code. The api/cli distinction feels nice here, though I think there is more room to decompose... was using this as an opportunity to play with some ideas prior to dropping click into qiime
Note, travis and coveragerc are included but no unit tests have been written and haven't hooked up services yet
@audy, using the
HashingVectorizer
is awesome. I'm not seeing the memory spike you mentioned, but instead seeing a runtime issue that is driven by the serialization (profiling show'd stupid amounts of time being spent inravel
in_mul_multivector
in scipy'scompressed.py
or something. The notebook was a few orders of magnitude faster by doing the kword transform external, but I agree with you that this is likely a bug.Last, two things, the resulting classifications aren't great yet but we're near in a place to really start poking and proding and this is currently dependent on biocore/scikit-bio#643