Closed sethcoast closed 6 years ago
The unknown label type error (using LL0_40509_Australian) is caused when the last column is used as the target. Unfortunately, the class column is the first column, labeld 'Y'. Using this as the target removes the error. For each dataset, we must determine which column is the target/class dynamically, we cannot assume it is the last column. For D3M datasets, the associated .json file should tell us this info.
The error
metalearn/env/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 97, in unique_labels raise ValueError("Unknown label type: %s" % repr(ys))
seems to be caused by not using the correct column as the target class.
sklearn problems solved with warnings.filterwarnings
:
metalearn/env/lib/python3.6/site-packages/sklearn/discriminant_analysis.py:442: UserWarning: The priors do not sum to 1.
metalearn/env/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler. warnings.warn(msg, DataConversionWarning)
ValueError: cannot reshape array of size 0 into shape (0,newaxis)
This was caused by a reshape on an empty dataframe in _get_canonical_correlations. The check for an empty dataframe now comes before the reshape.
Using the correct target fixed:
_/metalearn/env/lib/python3.6/site-packages/sklearn/model_selection/split.py:605: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=2. % (min_groups, self.n_splits)), Warning)
metalearn/env/lib/python3.6/site-packages/sklearn/covariance/shrunkcovariance.py:193: UserWarning: Only one sample available. You may want to reshape your data array warnings.warn("Only one sample available. "
_ metalearn/env/lib/python3.6/site-packages/numpy/core/methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims)
I found some divide by 0 which propagated nans. This true_divide warning still appears when not suppressed, but that case is handled appropriately. We now don't get the perentile warning.
metalearn/env/lib/python3.6/site-packages/numpy/lib/function_base.py:4291: RuntimeWarning: Invalid value encountered in percentile interpolation=interpolation)
metalearn/env/lib/python3.6/site-packages/sklearn/cross_decomposition/pls.py:329: RuntimeWarning: invalid value encountered in true_divide / np.dot(y_scores.T, yscores))
Handling the effects of the above warnings also handles the effects of these warnings:
metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:77: UserWarning: Maximum number of iterations reached warnings.warn('Maximum number of iterations reached')
metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:313: UserWarning: X scores are null at iteration 0 warnings.warn('X scores are null at iteration %s' % k)
@poolguy I could not reproduce the MemoryError on that dataset. Perhaps it was handled by having the correct column as the target class. Please open a new issue if you can reproduce it again and include more detail.
WARNINGS
[x] _metalearn/env/lib/python3.6/site-packages/sklearn/discriminantanalysis.py:442: UserWarning: The priors do not sum to 1.
LL0_1549_autouniv_au6_750
LL0_1520_robot_failures_lp5
LL0_1493_one_hundred_plants_texture
LL0_1466_cardiotocography
LL0_1531_volcanoes_b1
LL0_337_spectf
LL0_238_drivface
LL0_1508_user_knowledge
LL0_1515_micro_mass
LL0_478_collins
LL0_1529_volcanoes_a3
LL0_186_braziltourism
LL0_32_pendigits
LL0_1_anneal
LL0_1481_kr_vs_k
LL0_1538_volcanoes_d1
[x] metalearn/env/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler. warnings.warn(msg, DataConversionWarning)
LL0_1008_analcatdata_reviewer
LL0_333_monks_problems_1
LL0_337_spectf
LL0_475_analcatdata_germangss
LL0_335_monks_problems_3
LL0_1040_sylva_prior
LL0_40693_xd6
LL0_1041_gina_prior2
LL0_1036_sylva_agnostic
LL0_953_splice
LL0_40706_parity5_plus_5
LL0_40649_GAMETES_Heterogeneity_20atts_1600_Het_0
LL0_747_servo
LL0_329_hayes_roth
[x] _metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:329: RuntimeWarning: invalid value encountered in true_divide / np.dot(y_scores.T, yscores))
LL0_337_spectf
LL0_1501_semeion
LL0_475_analcatdata_germangss
LL0_40509_Australian
LL0_238_drivface
LL0_1176_internet_advertisements
LL0_488_colleges_aaup
LL0_1100_popularkids
LL0_42_soybean
LL0_1459_artificial_characters
LL0_1217_click_prediction_small
[x] _/metalearn/env/lib/python3.6/site-packages/sklearn/model_selection/_split.py:605: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=2. % (min_groups, self.nsplits)), Warning)
LL0_337_spectf
LL0_475_analcatdata_germangss
LL0_238_drivface
LL0_186_braziltourism
LL0_1217_click_prediction_small
[x] _metalearn/env/lib/python3.6/site-packages/sklearn/covariance/shrunkcovariance.py:193: UserWarning: Only one sample available. You may want to reshape your data array warnings.warn("Only one sample available. "_
LL0_337_spectf
LL0_475_analcatdata_germangss
LL0_238_drivface
LL0_39_ecoli
LL0_186_braziltourism
[x] _metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:77: UserWarning: Maximum number of iterations reached warnings.warn('Maximum number of iterations reached')_
LL0_4153_Smartphone_Based_Recognition_of_Human_Activities
LL0_475_analcatdata_germangss
LL0_42_soybean
[x] _ metalearn/env/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umrminimum(a, axis, None, out, keepdims)
LL0_475_analcatdata_germangss
LL0_1100_popularkids
[x] _metalearn/env/lib/python3.6/site-packages/numpy/lib/functionbase.py:4291: RuntimeWarning: Invalid value encountered in percentile interpolation=interpolation)
LL0_475_analcatdata_germangss
LL0_1100_popularkids
[x] _metalearn/env/lib/python3.6/site-packages/sklearn/crossdecomposition/pls.py:313: UserWarning: X scores are null at iteration 0 warnings.warn('X scores are null at iteration %s' % k)_
LL0_238_drivface
LL0_42_soybean
ERRORS
[x] _metalearn/env/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 97, in uniquelabels raise ValueError("Unknown label type: %s" % repr(ys))
LL0_40509_Australian
LL0_4153_Smartphone_Based_Recognition_of_Human_Activities
LL0_228_breast_cancer_wisconsin_diagnostic
LL0_679_rmftsa_sleepdata
[x] ValueError: cannot reshape array of size 0 into shape (0,newaxis)
LL0_488_colleges_aaup
LL0_42_soybean
[ ] MemoryError
LL0_1217_click_prediction_small