Closed drakar closed 7 years ago
Hi drakar,
When missing values are spaces, pandas regards the whole column as strings, which might be the reason for this data type error.
The DataDescriber is just updated to skip the initial spaces when reading values from the cells of input CSV file. Please try and let me know if it works.
It still is not working. Here is what I do:
>>> import pandas as pd
>>> df = pd.read_csv("file.csv")
>>> type(df['latitude'][1])
<class` 'str'>
>>>
In the .CSV file the fields are literally '<null>'
which is a string, but can it be treated as a null value or coerced into a float?
Haoyue,
Please look into this.
Julia.
On 8/30/17 1:33 PM, Aaron Drake wrote:
It still is not working. Here is what I do:
import pandas as pd df = pd.read_csv("file.csv") type(df['latitude'][1]) <class` 'str'>
In the .CSV file the fields are literally '' which is a string, but can it be treated as a null value or coerced into a float?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDataResponsibly%2FDataSynthesizer%2Fissues%2F1%23issuecomment-326063503&data=02%7C01%7Cjs3735%40drexel.edu%7Cf825b2eda4c04b84d85608d4efcd29d0%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C636397111828729701&sdata=yS8MWEryojT34CfByGr9hx8RwjHraNYCUc4vJj5l6Xw%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFstk1TNbb5o9w45Hpdap6TaVTDtC-YCks5sdZzMgaJpZM4PExeo&data=02%7C01%7Cjs3735%40drexel.edu%7Cf825b2eda4c04b84d85608d4efcd29d0%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C636397111828729701&sdata=Lc9gYt1G4T%2FY%2BSeF5%2FMfVVxe%2FfJ5rWjjEfqJYVuJOzo%3D&reserved=0.
Pandas allows user-defined NULL values by parameter na_values
when reading CSV file. See pandas.read_csv
.
DataSynthesizer supports this functionality now, which is essentially by adding parameter null_values
and passing it to pandas.read_csv
in DataDescriber.read_dataset_from_csv
.
In your case, you can try df = pd.read_csv("file.csv", na_values="'<null>'")
.
@haoyueping Thank you very much!
@drakar No problem. Thanks for your feedback!
Hi,
In a float or int field, it appears that the pandas lib treats them as string fields rather than flat with null value.
Is there anyway to force float, either in the UI or in the pandas read method?