Closed alessiamarcolini closed 4 years ago
compare all
vs np.all
previously consider the pandas dtype and analyze only columns with object
dtype
use pd.select_dtypes
to select columns with a certain dtype
recognized by pandas (it may use generic dtypes like number
use pd.to_numeric
or pd.to_timedelta
or pd.to_datetime
when fixing errors (before using some manually selected dictionaries)
pd.api.types.infer_dtype()
We may use this pandas function to find the column types. It uses multiple layers for recognition:
np.dtype = object
, then further checks are performed by looking at the type of each sample. Possible types are: DateTime, TimeDelta, Integer,...cat_type = pandas.CategoricalDType(categories=['a','b'], ordered=True)
s = pd.Series(list('babc')).astype(cat_type) # also `CategoricalDtype(list('abcd'))` can replace cat_type directly