[INFO]: Propose changes to `pymc`

Currently, if one tries to supply observations from a numpy.ndarray with both strings and floats to a pymc.Distribution errors are raised. It seems pymc internally relies on numpy.isnan to check for missing values, which will raise in the above example. If the intention is to check for nan only there is a slightly more general way of doing this

import numpy as np
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
names = load_iris().target_names
y = y[:,None].astype(object)
for i in range(3):
   y[y==i]=names[i]
# Dataset of the form (X|y) with y replaced with strings
arr = np.concatenate([X, y], axis=1)
# np.isnan(arr) Will raise
# A more general way
# `np.nan != np.nan` nan isn't equal to itself
isnan = np.vectorize(lambda elem != elem)
isnan(arr)

AlexRodis / bayesian-models

[INFO]: Propose changes to `pymc` #66