TeamHG-Memex / soft404

A classifier for detecting soft 404 pages
56 stars 14 forks source link

Remove functional transformers from the ppl to fix sklearn warnings #16

Closed lucywang000 closed 5 years ago

lucywang000 commented 5 years ago

This should address #15 . I have tested it with sklearn 0.19.1 and 0.20.0

Related discussion: https://github.com/TeamHG-Memex/soft404/issues/15#issuecomment-424699517

Also fix the tests after html_text 0.4 breaks the tests.

lucywang000 commented 5 years ago

ping @lopuhin @kmike

lucywang000 commented 5 years ago

Travis fails with strange errors in py35-train and py36-train tox env:

        if save:
            pipeline = Pipeline([('vec', vec)] + text_pipeline.steps)
>           Soft404Classifier.save_model(save, pipeline)
E           NameError: name 'Soft404Classifier' is not defined

Any ideas?

lopuhin commented 5 years ago

@lucywang000 thanks for the PR! Regarding the error, I think it happens because import of Soft404Classifier was removed here https://github.com/TeamHG-Memex/soft404/pull/16/files#diff-a5c400537b9dda586f39e3b735026061L28 but it's still used in the code.

lucywang000 commented 5 years ago

@lopuhin I missed that - thanks for catching!

I think it's fixed now.

codecov-io commented 5 years ago

Codecov Report

Merging #16 into master will increase coverage by 0.11%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #16      +/-   ##
==========================================
+ Coverage   92.65%   92.77%   +0.11%     
==========================================
  Files           5        5              
  Lines         245      249       +4     
==========================================
+ Hits          227      231       +4     
  Misses         18       18
Impacted Files Coverage Δ
soft404/train.py 97.08% <100%> (ø) :arrow_up:
soft404/utils.py 97.56% <100%> (ø) :arrow_up:
soft404/predict.py 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 984535b...67626bd. Read the comment docs.

lopuhin commented 5 years ago

Sorry, I wanted to do a regular merge, didn't notice it was set to squash.

lucywang000 commented 5 years ago

@lopuhin Thanks for the discussion & review!