Open Juan0001 opened 6 years ago
Missing data is definitely a problem for the visualizers. In general, we expect something that looks like this:
from sklearn.preprocessing import Imputer
from sklearn.pipeline import Pipeline
from yellowbrick.features import ParallelCoordinates
model = Pipeline([
('impute', Imputer()),
('viz', ParallelCoordinates()),
])
model.fit_transfrorm(X, y)
In the near term, perhaps this will help? In the medium term, @ndanielsen is working on some missing data visualizers (#366) and RadViz has actually been updated to visualize anything that is not a nan
(#302) so we could do that with parallel coordinates as well and prevent this problem.
@Juan0001 thanks for posting the issue!
@bbengfort That's exactly what I did for my problem. If the missing values are imputed before feeding into ParallelCoordinates, it will not have any problem. But I think it will be good to have a choice if we want to impute the missing value first before we use ParallelCoordinates. I will take a look at the package see if I could have any improvement on that.
Thank you very much!
While I was trying to use ParallelCoordinates with normalization on a dataset with missing value, I got the following error.
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I managed to get around it by normalize my data (by ignoring the missing value) before feed into the visualizer. Hope you can fix it within the visualizer.
Thank you.