Open ysayeed opened 3 years ago
Yes it looks like the facets overview code doesn't support the Categorical type. You can change it to a series of standard strings and then the proto creation should work.
In order for this code to work on Categorical series out of the box, https://github.com/PAIR-code/facets/blob/master/facets_overview/python/base_generic_feature_statistics_generator.py#L69 would need to be updated to check for the Categorical dtype and return self.fs_proto.STRING
in that case, before the current checks that use dtype.char
(since the Categorical type doesn't have the char
member variable).
Thanks, that workaround solves things for me.
I am running into a similar error, but here it is not handling string data.
File "/ashley/.cache/pypoetry/virtualenvs/test-BIYvDDBt-py3.8/lib/python3.8/site-packages/facets_overview/base_generic_feature_statistics_generator.py", line 54, in ProtoFromDataFrames
table_entries[col] = self.NdarrayToEntry(table[col])
File "/ashley/.cache/pypoetry/virtualenvs/test-BIYvDDBt-py3.8/lib/python3.8/site-packages/facets_overview/base_generic_feature_statistics_generator.py", line 119, in NdarrayToEntry
data_type = self.DtypeToType(x.dtype)
File "/ashley/.cache/pypoetry/virtualenvs/test-BIYvDDBt-py3.8/lib/python3.8/site-packages/facets_overview/base_generic_feature_statistics_generator.py", line 66, in DtypeToType
if dtype.char in np.typecodes['AllFloat']:
AttributeError: 'StringDtype' object has no attribute 'char'
This is using python 3.8, pandas 1.4, and facets-overview 1.0.0
Would appreciate some help!
The facets code is quite old and doesn't contain support for the newer StringDtype for string values. If you instead use the standard "object" type for the strings, the code should work.
@jameswex Thank you! I had to convert Int64Dtype as well it turned out. Possibly this belongs in another thread, but I am seeing a new error after doing type conversion:
proto_str = GenericFeatureStatisticsGenerator().ProtoFromDataFrames(dfs).SerializeToString()
File "/ashley/.cache/pypoetry/virtualenvs/scorecard-BIYvDDBt-py3.8/lib/python3.8/site-packages/facets_overview/base_generic_feature_statistics_generator.py", line 60, in ProtoFromDataFrames
return self.GetDatasetsProto(
File "/ashley/.cache/pypoetry/virtualenvs/scorecard-BIYvDDBt-py3.8/lib/python3.8/site-packages/facets_overview/base_generic_feature_statistics_generator.py", line 284, in GetDatasetsProto
sample_count=np.asscalar(val[0]),
File "/ashley/.cache/pypoetry/virtualenvs/scorecard-BIYvDDBt-py3.8/lib64/python3.8/site-packages/numpy/__init__.py", line 311, in __getattr__
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'asscalar'
Any insight?
I believe it has to do with your numpy version. See https://numpy.org/doc/1.21/reference/generated/numpy.asscalar.html
You can downgrade numpy or update the facets code to use the appropriate replacement method.
When attempting to create the proto for facets-overview, if any of the columns are categorical, the operation will fail with an attribute error. I would expect it to properly parse the dataframe, treating the category dtype as a string and displaying it in the "Categorical Features" section in the same way.
Below is example code to produce this error and the traceback:
This is using facets-overview 1.0.0 and pandas 1.1.4.