JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

scattertext breaks with scikit-learn 1.2.0 #125

Closed jlondonobo closed 1 year ago

jlondonobo commented 1 year ago

Description

An error is raised when trying to import scattertext using scikit-learn 1.2.0. This is due to the deprecation of the argument alpha in the class NMF (Introduced in this PR).

Steps to Reproduce

  1. Create a virtual environment
  2. Install scattertext
  3. Make sure that scikit-learn's version is 1.2.0
  4. Import scattertext
import scattertext as st

An error will be raised with the following message:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
----> 10 import scattertext as st

File ~/opt/miniconda3/envs/human-tools/lib/python3.8/site-packages/scattertext/__init__.py:101
     99 from scattertext.termcompaction.CompactTerms import CompactTerms
    100 from scattertext.termcompaction.PhraseSelector import PhraseSelector
--> 101 from scattertext.topicmodel.SentencesForTopicModeling import SentencesForTopicModeling
    102 from scattertext.frequencyreaders.DefaultBackgroundFrequencies import DefaultBackgroundFrequencies, \
    103     BackgroundFrequenciesFromCorpus
    104 from scattertext.termcompaction.DomainCompactor import DomainCompactor

File ~/opt/miniconda3/envs/human-tools/lib/python3.8/site-packages/scattertext/topicmodel/__init__.py:1
----> 1 from .SentencesForTopicModeling import SentencesForTopicModeling

File ~/opt/miniconda3/envs/human-tools/lib/python3.8/site-packages/scattertext/topicmodel/SentencesForTopicModeling.py:11
      7 from scattertext.ParsedCorpus import ParsedCorpus
      8 from scattertext.termscoring.RankDifference import RankDifference
---> 11 class SentencesForTopicModeling(object):
     12     '''
     13     Creates a topic model from a set of key terms based on sentence level co-occurrence.
     14     '''
     16     def __init__(self, corpus):

File ~/opt/miniconda3/envs/human-tools/lib/python3.8/site-packages/scattertext/topicmodel/SentencesForTopicModeling.py:54, in SentencesForTopicModeling()
     47 def get_sentence_word_mat(self):
     48     return self.sentX.astype(np.double).tocoo()
     50 def get_topics_from_model(
     51         self,
     52         pipe=Pipeline([
     53             ('tfidf', TfidfTransformer(sublinear_tf=True)),
---> 54             ('nmf', (NMF(n_components=30, alpha=.1, l1_ratio=.5, random_state=0)))]),
     55         num_terms_per_topic=10):
     56     '''
...
     69     dict: {term: [term1, ...], ...}
     70     '''
     71     pipe.fit_transform(self.sentX)

TypeError: __init__() got an unexpected keyword argument 'alpha'

Expected behavior

The import should be completed without raising errors.

Environment

Additional context

The problem is in Line 54 of scattertext/topicmodel/SentencesForTopicModeling.py

I will submit a PR with a possible solution in a couple of minutes

JasonKessler commented 1 year ago

Thanks so much for reporting this and the PR. In 0.1.11, I removed the offending parameter from the NMF constructor, and that should fix this issue without having to add an sklearn version requirement.