Open simone-mangiante opened 1 year ago
Hi, @simone-mangiante, I could not replicate this error with either DataSynthesizer 0.1.10 or 0.1.11. I tested your script with the telco-customer-churn dataset from Kaggle, and Python 3.7, on Pop!_OS 22.04.
Please check out DataSynthesizer 0.1.12 as well.
Description
DataDescriber
does not handle wellbool
dtypes in the source dataset. When the CSV file has columns with onlyTRUE
andFALSE
as values,pandas
reads such columns asbool
dtype (notobject
) and, when inferring types, the code ends up in checking them as dates and fails.What I Did
The source dataset is the telco-customer-churn dataset from Kaggle, after being imported in Google BigQuery and exported back to CSV, generating those
TRUE
andFALSE
values instead ofYes
andNo
. Below is my code:Here is the output: