Added support for Pandas types: CategoricalDtype, Categorical and Series(dtype='category')
Added support for categorical in read_csv(). Eliminating of astype() for categorical columns is deferred to future PRs.
Also introduced:
Rewrite TuplifyArgs (sdc/datatypes/common/rewriteutils.py) which replaces arguments provided as lists with the same data represented as tuple. It helps get types of arguments in compile time.
This rewrite is reusable. Categorical types use it widely for inferring categories in compile time.
Improved RewriteReadCsv with the same approach as TuplifyArgs. It is possible to extend TuplifyArgs with map support and reuse in RewriteReadCsv but it is deferred for the future.
Added sdc.types (sdc/sdc/types.py) which is analogy to numba.types. It is a collection of SDC types.
Used approach when SDC types like CategoricalDtype and Categorical has function repr(). This function returns string which could be used in eval() to recreate this objects. This approach simplifies objmode usage. objmode requires from user to provide string for eval() which will create Numba type. objmode use eval() with numba.types available so it also necessary to extend numba.types with SDC types to use this approach.
I have rearranged commits to make it easy to review.
In this PR:
Also introduced:
I have rearranged commits to make it easy to review.