Modify preprocessing module to account for Spanish

I'm getting the below warning, followed by a sklearn error when trying to run with the proposed updates to preprocessing. Please try debugging so that the system still runs end-to-end with all English models. I will work on getting to work with Spanish, but let's ensure we don't lose the prior

/Users/Karl/opt/anaconda3/envs/PlaceboAffect/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /Users/Karl/opt/anaconda3/envs/PlaceboAffect/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/src/features/extract_features.py:181: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return np.array(vectors)

The error: `TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/scripts/../src/main.py", line 257, in main() File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/scripts/../src/main.py", line 252, in main run(args.mode, args.task, args.train_data, args.dev_data, args.test_data, args.result, args.predictions, args.model, File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/scripts/../src/main.py", line 209, in run clf.fit(train_vector.vector, data_train.label, tuning=(dev_vector.vector, data_dev.label), File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/src/modeling/classifier.py", line 54, in fit self.model = self._tune_sklearn(text=text, label=label, tuning=tuning) File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/src/modeling/classifier.py", line 99, in _tune_sklearn model = self._grid_search(params=param_dicts, text=text, label=label, tuning=tuning) File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/src/modeling/classifier.py", line 123, in _grid_search model = self._fit_sklearn(hyp_params=combination, text=text, label=label) File "/Users/Karl/Documents/_UW_Compling/LING573/PlaceboAffect/src/modeling/classifier.py", line 140, in _fit_sklearn model.fit(text, label) File "/Users/Karl/opt/anaconda3/envs/PlaceboAffect/lib/python3.9/site-packages/sklearn/svm/_base.py", line 190, in fit X, y = self._validate_data( File "/Users/Karl/opt/anaconda3/envs/PlaceboAffect/lib/python3.9/site-packages/sklearn/base.py", line 581, in _validate_data X, y = check_X_y(X, y, **check_params) File "/Users/Karl/opt/anaconda3/envs/PlaceboAffect/lib/python3.9/site-packages/sklearn/utils/validation.py", line 964, in check_X_y X = check_array( File "/Users/Karl/opt/anaconda3/envs/PlaceboAffect/lib/python3.9/site-packages/sklearn/utils/validation.py", line 746, in check_array array = np.asarray(array, order=order, dtype=dtype) ValueError: setting an array element with a sequence. `

MElkamhawy / PlaceboAffect

Modify preprocessing module to account for Spanish #54