Describe the issue:
When using an arrow dict type of string: int32, if directly set on the dask dataframe the values are coerced to a pa.large_string but if set from a preexisting coerced pandas df, the dask dataframe is correctly set to string: int32.
Anything else we need to know?:
I would assume either way (string or large string) the behavior here should be consistent. My recommendation would be to keep it as a pa.string since that is the pandas behavior, but maybe there is a reason this has changed?
Describe the issue: When using an arrow dict type of string: int32, if directly set on the dask dataframe the values are coerced to a pa.large_string but if set from a preexisting coerced pandas df, the dask dataframe is correctly set to string: int32.
Minimal Complete Verifiable Example:
Anything else we need to know?: I would assume either way (string or large string) the behavior here should be consistent. My recommendation would be to keep it as a pa.string since that is the pandas behavior, but maybe there is a reason this has changed?
Environment: