Closed Mol1hua closed 3 years ago
I got it to run for now, I noticed I had NaN in the target variable! I still don't understand how that caused the error above, but I am happy it is running now. :-)
Hey @Mol1hua! Thank you so much for the report! Apologies for the delay in answering; it's been a weird month on my side.
I did take a look at the issue and did a couple of fixes. However, the more I dug in, the more I realized that having NaN fields in the target variable potentially lead to confusion for the user.
e.g. how to interpret target distribution if, say, 60% of the target data was missing? I fear this would lead to people quickly looking at a graph to make generalizations about the target without realizing a ton of data is missing. I know this happens with "regular" features as well, but missing data is outlined much more clearly in those cases and it's hard to do the same for target data in every graph.
So, I am leaning towards not allowing target analysis unless there is no missing data (or 100% of the compared target), so target interpretation is unambiguous. This will be in the next version.
Again I thank you for your detailed reports and if you have any further comments on this don't hesitate to let me know.
Hello,
I love trying out the sweetviz library! I am using a numeric target variable "dummy_overall_status" and have some categorical features in the data set, e.g. "test_device". Unfortunately, when I run
my_report = sv.analyze(df, pairwise_analysis = "off", target_feat = "dummy_overall_status")
I get the following error message for my categorical variable "test_device":
followed by a class distribution barchart.
This error does not occur when I run "analyze" without the target_feat parameter! It looks like the function wrongly assumes that test_device is a boolean series, but it contains only strings (no NaN either).
Is there a workaround? Thank you very much!