chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

TopicModel broken with scikit-learn 0.21 #260

Closed jonringer closed 5 years ago

jonringer commented 5 years ago

steps to reproduce

  1. install scikit-learn 0.21
  2. run test suite

expected vs. actual behavior

lib/python3.7/site-packages/textacy/tm/topic_model.py:125: in __init__
    self.init_model(model, n_topics=n_topics, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[AttributeError("'TopicModel' object has no attribute 'model'") raised in repr()] TopicModel object at 0x7fffdd7362d0>
model = 'lda', n_topics = 10, kwargs = {}

    def init_model(self, model, n_topics=10, **kwargs):
        if model == "nmf":
            self.model = NMF(
                n_components=n_topics,
                alpha=kwargs.get("alpha", 0.1),
                l1_ratio=kwargs.get("l1_ratio", 0.5),
                max_iter=kwargs.get("max_iter", 200),
                random_state=kwargs.get("random_state", 1),
                shuffle=kwargs.get("shuffle", False),
            )
        elif model == "lda":
            self.model = LatentDirichletAllocation(
                n_topics=n_topics,
                max_iter=kwargs.get("max_iter", 10),
                random_state=kwargs.get("random_state", 1),
                learning_method=kwargs.get("learning_method", "online"),
                learning_offset=kwargs.get("learning_offset", 10.0),
                batch_size=kwargs.get("batch_size", 128),
>               n_jobs=kwargs.get("n_jobs", 1),
E               TypeError: __init__() got an unexpected keyword argument 'n_topics'

possible solution?

just remove the optional argument and don't pass it through to the other constructor?

context

passes the argument n_topics to LDA, however, the argument was removed

https://github.com/scikit-learn/scikit-learn/blob/7b346e784b1391815bd17f64fd239d3fbd78dd5a/sklearn/decomposition/online_lda.py#L277

environment

bdewilde commented 5 years ago

Hi @jonringer , As you've noticed, textacy isn't compatible with this v0.21 of scikit-learn. APIs and dependencies change, and that means you can't always support the full range of versions. Currently, textacy supports v0.18 through v0.20 — see here — but I'll keep this issue in mind when the package is in a position to update its support.

jonringer commented 5 years ago

Sounds good, it just looks like you defaulted the value, and I'm not sure how import n_topics is to what problem it's trying to solve. But if the scikit-learn community removed it, then my assumption is the argument had limited impact.

But, it's on your radar, so deal with it when appropriate :). This seems to be the only blocker to adopting scikit-learn 0.21, according to the test suite.

bdewilde commented 5 years ago

The bigger issue with v0.21 is that it's PY3-only, and I'm still supporting PY2.7. Happily, that's going to change soon, so I may bump the scikit-learn version requirements when I transition to a PY3-only codebase. :+1:

bdewilde commented 5 years ago

heads-up: https://github.com/chartbeat-labs/textacy/commit/2c45e3d06ba8447225aca441c113f85675e9ac6a