Open sbadithe opened 1 year ago
Just to add a little more, I think part of the inconsistent/confusing behavior is if you take a series that has numeric values, but not a category
dtype, and initialize with the PostalCode
logical type, the numeric values get converted to strings:
>>> ser = pd.Series([12345, 67890])
>>> ser = ww.init_series(ser, logical_type='PostalCode')
>>> type(ser[0])
<class 'str'>
But if you start with the same values and set the type as category
before WW init, you end up with numeric values instead of strings:
>>> ser = pd.Series([12345, 67890]).astype("category")
>>> ser = ww.init_series(ser, logical_type='PostalCode')
>>> type(ser[0])
<class 'numpy.int64'>
I believe WW should provide a consistent output in this case, so that no matter the input dtype type we have the same type used in the output after WW initialization.
Series with PostalCode logical type can have
float
orstr
elements.For example,
In the above code block, the elements of the series are floats, but in the following, they are strings:
Both are valid initializations. We should decide whether we want to support both data types for the PostalCode logical type.
This issue was discussed here. https://github.com/alteryx/featuretools/pull/2365