Is your feature request related to a problem? Please describe.
When reading jsonl files with Dask, the dataframe datatypes are inferred unless explicitly specified.
Inferring the data types can lead to several issues, such as incorrect type inference, degradation of performance and increased memory usage among others.
I think we could mitigate those issues if we would add a --meta parameter, which would receive a dictionary of datatypes.
Is your feature request related to a problem? Please describe. When reading jsonl files with Dask, the dataframe datatypes are inferred unless explicitly specified.
Inferring the data types can lead to several issues, such as incorrect type inference, degradation of performance and increased memory usage among others.
I think we could mitigate those issues if we would add a
--meta
parameter, which would receive a dictionary of datatypes.That parameter would be optional, and be similar to the
--meta
parameter available here: https://docs.dask.org/en/latest/generated/dask.dataframe.read_json.html.