Currently, if one tries to supply observations from a numpy.ndarray with both strings and floats to a pymc.Distribution errors are raised. It seems pymc internally relies on numpy.isnan to check for missing values, which will raise in the above example. If the intention is to check for nan only there is a slightly more general way of doing this
import numpy as np
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
names = load_iris().target_names
y = y[:,None].astype(object)
for i in range(3):
y[y==i]=names[i]
# Dataset of the form (X|y) with y replaced with strings
arr = np.concatenate([X, y], axis=1)
# np.isnan(arr) Will raise
# A more general way
# `np.nan != np.nan` nan isn't equal to itself
isnan = np.vectorize(lambda elem != elem)
isnan(arr)
Currently, if one tries to supply observations from a
numpy.ndarray
with both strings and floats to apymc.Distribution
errors are raised. It seemspymc
internally relies onnumpy.isnan
to check for missing values, which will raise in the above example. If the intention is to check fornan
only there is a slightly more general way of doing this