freeCodeCamp / 2016-new-coder-survey

196 stars 61 forks source link

Update cleaning of dataset #72

Closed erictleung closed 7 years ago

erictleung commented 7 years ago

cc/ @evaristoc @SamAI-Software @QuincyLarson

Close #33

QuincyLarson commented 7 years ago

@erictleung I am unfamiliar with R so I don't feel qualified to QA this, but all of these changes you described sound sane :)

SamAI-Software commented 7 years ago

The issues might be because some people already did their analyses, so changing variable names will break their code. Mine, too. I understand the reasons why Eric changed names and fixed typos, but it might be too late for that.

erictleung commented 7 years ago

Right, I understand that I've changed those variables names and it will break some people's code. @evaristoc and I discussed the reason for renaming the variable names with Other in them. And I agree, it might be too late at this point.

I guess it is not too urgent that those variable names be changed. I can revert them back and just make a note of it in the README file.

The most important part of the change is the normalization part to address issue #33.

SamAI-Software commented 7 years ago

The most important part of the change is the normalization part to address issue #33.

This part seems to be fine.

If you revert the old variable names and add a note into README, then we should be good to go.

erictleung commented 7 years ago

@SamAI-Software awesome, I'll try to get to it later tonight.

erictleung commented 7 years ago

@SamAI-Software updated my PR!

I reverted the major change of adding Other into the variables names. I did, however, keep the variable change for IsReceiveDisabilitiesBenefits as the original IsReceiveDiabilitiesBenefits has a typo.

Feel free to pull down my PR and QA check the dataset. Let me know if there's anything else of concern 😃

SamAI-Software commented 7 years ago

LGTM