Prevents Dask from passing pd.NA to the filters for type inference on the scoring and filtering functions. Also fixes some issues with task decontamination working with pandas 2.0 strings and exploding.
With task decontamination, we convert the document text column (dtype=string) to a list of split documents (dtype=object). When calling explode on this column of split documents, the column maintains its object datatype even though now it's only strings. We need to recast the column for newer versions of pandas/dask where string and object are different datatypes.
Prevents Dask from passing
pd.NA
to the filters for type inference on the scoring and filtering functions. Also fixes some issues with task decontamination working with pandas 2.0 strings and exploding.With task decontamination, we convert the document text column (dtype=string) to a list of split documents (dtype=object). When calling explode on this column of split documents, the column maintains its object datatype even though now it's only strings. We need to recast the column for newer versions of pandas/dask where string and object are different datatypes.