byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link

Errors and warnings on D3M datasets #18

Closed sethcoast closed 6 years ago

sethcoast commented 6 years ago

WARNINGS

ERRORS

bjschoenfeld commented 6 years ago

The unknown label type error (using LL0_40509_Australian) is caused when the last column is used as the target. Unfortunately, the class column is the first column, labeld 'Y'. Using this as the target removes the error. For each dataset, we must determine which column is the target/class dynamically, we cannot assume it is the last column. For D3M datasets, the associated .json file should tell us this info.

bjschoenfeld commented 6 years ago

The error

metalearn/env/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 97, in unique_labels raise ValueError("Unknown label type: %s" % repr(ys))

seems to be caused by not using the correct column as the target class.

bjschoenfeld commented 6 years ago

sklearn problems solved with warnings.filterwarnings:

metalearn/env/lib/python3.6/site-packages/sklearn/discriminant_analysis.py:442: UserWarning: The priors do not sum to 1.

metalearn/env/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler. warnings.warn(msg, DataConversionWarning)

bjschoenfeld commented 6 years ago

ValueError: cannot reshape array of size 0 into shape (0,newaxis)

This was caused by a reshape on an empty dataframe in _get_canonical_correlations. The check for an empty dataframe now comes before the reshape.

bjschoenfeld commented 6 years ago

Using the correct target fixed:

_/metalearn/env/lib/python3.6/site-packages/sklearn/model_selection/split.py:605: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=2. % (min_groups, self.n_splits)), Warning)

metalearn/env/lib/python3.6/site-packages/sklearn/covariance/shrunkcovariance.py:193: UserWarning: Only one sample available. You may want to reshape your data array warnings.warn("Only one sample available. "

_ metalearn/env/lib/python3.6/site-packages/numpy/core/methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims)

bjschoenfeld commented 6 years ago

I found some divide by 0 which propagated nans. This true_divide warning still appears when not suppressed, but that case is handled appropriately. We now don't get the perentile warning.

metalearn/env/lib/python3.6/site-packages/numpy/lib/function_base.py:4291: RuntimeWarning: Invalid value encountered in percentile interpolation=interpolation)

metalearn/env/lib/python3.6/site-packages/sklearn/cross_decomposition/pls.py:329: RuntimeWarning: invalid value encountered in true_divide / np.dot(y_scores.T, yscores))

Handling the effects of the above warnings also handles the effects of these warnings:

metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:77: UserWarning: Maximum number of iterations reached warnings.warn('Maximum number of iterations reached')

metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:313: UserWarning: X scores are null at iteration 0 warnings.warn('X scores are null at iteration %s' % k)

bjschoenfeld commented 6 years ago

@poolguy I could not reproduce the MemoryError on that dataset. Perhaps it was handled by having the correct column as the target class. Please open a new issue if you can reproduce it again and include more detail.