Colleen-ODonnell / KenyaPovertyTargetingModel

1 stars 0 forks source link

Decide on a way to denote missing values #9

Closed Colleen-ODonnell closed 4 years ago

steveofconnell commented 4 years ago

What are the options in python, and how to they get treated when included in some kind of basic function? For example, in Stata, missings are shown as . and are treated lie infinity (which is very dangerous) In R, they can be all form of NA, NA_real, NaN, etc. , and usually the function will break if you include missing values in a mean or a sum without specifying na.omit or na.rm ^above question maing being for my own knowledge since i know less python than you all at this point

Colleen-ODonnell commented 4 years ago

In Python, missing values are marked as NaN, None, or NA. When summing/counting data, Python will treat missing values as 0's. Groupby will automatically exclude missing values.

It would be necessary to drop missing values if you want to run a regression without NaN, None, or NA values (dropna). Many methods within function also have a "skipna" argument where you can exclude missing values.

Colleen-ODonnell commented 4 years ago

We plan to change categorical variables to dummy variables and calculate descriptive statistics. After we finish descriptive statistics, we will "dummy out" the missing variables by adding 0's in place of missing values and creating an additional column tagging missing values with 1.