Corneliusbusch / DS4D-project

0 stars 0 forks source link

Clean data for a commonly used data set #8

Closed Corneliusbusch closed 5 years ago

Corneliusbusch commented 5 years ago

Clean the data, so we all start from the same, clean data set.

MojcaFranic commented 5 years ago

I am going to clean the columns we do not need from both datasets, as we discussed. I will not merge them, we can each use what we need for our analysis as we go (since we are looking into data from different perspectives). I will change dates to DateTime and number entries into integers. Also, I will add users ages in the users' dataset. I am waiting on data hosts reply on inactive users locations, so I will update it as soon as I hear from them.

MojcaFranic commented 5 years ago

I cleaned both datasets to the point I believe we can all work from. I have issues with calculating the age; It will not calculate since the timezone is unknown. Any ideas on how to solve it?

Corneliusbusch commented 5 years ago

I would just set utc=false for both date time types and then it should work. What I did in my assignment didn't work?

Corneliusbusch commented 5 years ago

I have fixed the issue. The problem was the attribute utc=True, when the string was converted to a DateTimeIndex. I have changed that, created a Pull Request (PR) and approved it.