DS4PS / cpp-528-spr-2021

https://ds4ps.org/cpp-528-spr-2021/
0 stars 0 forks source link

Lab-04-Logging Variables #30

Open gzbib opened 3 years ago

gzbib commented 3 years ago

Hello @cenuno

I understand that we should avoid logging variables especially if we fail to interpret their coefficients.

However, in the lab tutorial, I noticed that we logged percent variables to adjust for their skew. How about this case:

I am using poverty.change variable which is poverty12-poverty00. When I printed the histogram, I got a bell shape, and when I logged the variable, it got skewed. So, can we say that logging a variable is not always a solution? and can we log a variable change in general?

image

Thank you in advance.

cenuno commented 3 years ago

You are correct, logging is not always the solution. You can log a variable change only if you are okay with dropping records where the variable change is negative.

This is direct result of the logging function: https://www.desmos.com/calculator/xtorekpda3. It expects its input to be larger than zero; however, change can be both positive or negative.

This means logging is not the best tool for normalizing your data with variable change.