CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
556 stars 129 forks source link

Sample notebook reports errors #145

Closed Raychanan closed 2 years ago

Raychanan commented 2 years ago

Hi, many thanks for the development of the great package!

I'm trying to run this sample notebook Predicting Conversations Gone Awry With Convokit on Google Colab here.

I did no modifications except for the first chunk I added

! pip -q install convokit
! pip uninstall spacy -y
! pip install -U spacy==3.1.4
!python -m spacy download en_core_web_sm

However, an error occurred in the second cell from the bottom: TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given. Would it be possible for you to point out how to correct the error? Many thanks!

Running prediction task for feature set politeness_strategies
Generating labels...
Computing paired features...
Using 38 features
Running leave-one-page-out prediction...
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-37-de914fca85cc>", line 11, in run_pred_single
    base_clf = Pipeline([("scaler", StandardScaler()), ("featselect", SelectPercentile(f_classif, 10)), ("logreg", LogisticRegression(solver='liblinear'))])
TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
[<ipython-input-38-9704095ec82e>](https://localhost:8080/#) in <module>()
      4 for combo in feature_combos:
      5     combo_names.append("+".join(combo).replace("_", " "))
----> 6     accuracy = run_pipeline(combo)
      7     accs.append(accuracy)
      8 results_df = pd.DataFrame({"Accuracy": accs}, index=combo_names)

6 frames
[<ipython-input-37-de914fca85cc>](https://localhost:8080/#) in run_pipeline(feature_set)
     97     y = labeled_pairs_df.first_convo_toxic.values
     98     print("Running leave-one-page-out prediction...")
---> 99     accuracy, coefs, scores, hyperparams, pvalue = run_pred(X, y, feature_names, labeled_pairs_df.page_id)
    100     print("Accuracy:", accuracy)
    101     print("p-value: %.4e" % pvalue)

[<ipython-input-37-de914fca85cc>](https://localhost:8080/#) in run_pred(X, y, fnames, groups)
     33 
     34     with Pool(os.cpu_count()) as p:
---> 35         prediction_results = p.map(partial(run_pred_single, X=X, y=y), splits)
     36 
     37     fselect_pvals_all = []

[/usr/lib/python3.7/multiprocessing/pool.py](https://localhost:8080/#) in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

[/usr/lib/python3.7/multiprocessing/pool.py](https://localhost:8080/#) in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

[/usr/lib/python3.7/multiprocessing/pool.py](https://localhost:8080/#) in worker()
    119         job, i, func, args, kwds = task
    120         try:
--> 121             result = (True, func(*args, **kwds))
    122         except Exception as e:
    123             if wrap_exception and func is not _helper_reraises_exception:

[/usr/lib/python3.7/multiprocessing/pool.py](https://localhost:8080/#) in mapstar()
     42 
     43 def mapstar(args):
---> 44     return list(map(*args))
     45 
     46 def starmapstar(args):

[<ipython-input-37-de914fca85cc>](https://localhost:8080/#) in run_pred_single()
      9     y_train, y_test = y[train_idx], y[test_idx]
     10 
---> 11     base_clf = Pipeline([("scaler", StandardScaler()), ("featselect", SelectPercentile(f_classif, 10)), ("logreg", LogisticRegression(solver='liblinear'))])
     12     clf = GridSearchCV(base_clf, {"logreg__C": [10**i for i in range(-4,4)], "featselect__percentile": list(range(10, 110, 10))}, cv=3)
     13 

TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given
jpwchang commented 2 years ago

Hi @Raychanan,

It appears that this is caused by a change to scikit-learn's SelectPercentile class in the 1.x scikit-learn release. I've committed an updated version of the notebook to deal with this change.

The change is small, so if you don't want to re-upload the notebook to colab from scratch, you can simply change one line in your existing colab notebook. Find the following line:

base_clf = Pipeline([("scaler", StandardScaler()), ("featselect", SelectPercentile(f_classif, 10)), ("logreg", LogisticRegression(solver='liblinear'))])

And change it to:

base_clf = Pipeline([("scaler", StandardScaler()), ("featselect", SelectPercentile(score_func=f_classif, percentile=10)), ("logreg", LogisticRegression(solver='liblinear'))])

That should resolve the error!

Raychanan commented 2 years ago

This helps a lot! Thanks so much!