AlexRodis / bayesian-models

A small library build on top of `pymc` that implements many common models
Apache License 2.0
0 stars 0 forks source link

[INFO]: Propose changes to `pymc` #66

Closed AlexRodis closed 1 year ago

AlexRodis commented 1 year ago

Currently, if one tries to supply observations from a numpy.ndarray with both strings and floats to a pymc.Distribution errors are raised. It seems pymc internally relies on numpy.isnan to check for missing values, which will raise in the above example. If the intention is to check for nan only there is a slightly more general way of doing this

import numpy as np
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
names = load_iris().target_names
y = y[:,None].astype(object)
for i in range(3):
   y[y==i]=names[i]
# Dataset of the form (X|y) with y replaced with strings
arr = np.concatenate([X, y], axis=1)
# np.isnan(arr) Will raise
# A more general way
# `np.nan != np.nan` nan isn't equal to itself
isnan = np.vectorize(lambda elem != elem)
isnan(arr)