InseadDataAnalytics / INSEADAnalytics

Other
122 stars 1.31k forks source link

LIMIT_BAL keeps getting read as a Factor Variable automatically. How to fix? #130

Open allyper opened 6 years ago

allyper commented 6 years ago

Have tried standard tricks of converting Factor variables to numeric, but then starts force-generating NAs

Antoine-Engerand commented 6 years ago

have you tried: as.numeric(as.character(LIMIT_BAL)) ?

allyper commented 6 years ago

Yep, didn't seem to work (generated the forced NAs as mentioned above)

VarunKShetty commented 6 years ago

@allyper: Can you show me the output of this command?

levels(LIMIT_BAL)

allyper commented 6 years ago

levels(CCdata$LIMIT_BAL) [1] "1,000,000" "10,000" "100,000" "110,000" "120,000" "130,000" "140,000" "150,000" "16,000" "160,000"
[11] "170,000" "180,000" "190,000" "20,000" "200,000" "210,000" "220,000" "230,000" "240,000" "250,000"
[21] "260,000" "270,000" "280,000" "290,000" "30,000" "300,000" "310,000" "320,000" "327,680" "330,000"
[31] "340,000" "350,000" "360,000" "370,000" "380,000" "390,000" "40,000" "400,000" "410,000" "420,000"
[41] "430,000" "440,000" "450,000" "460,000" "470,000" "480,000" "490,000" "50,000" "500,000" "510,000"
[51] "520,000" "530,000" "540,000" "550,000" "560,000" "570,000" "580,000" "590,000" "60,000" "600,000"
[61] "610,000" "620,000" "630,000" "640,000" "650,000" "660,000" "670,000" "680,000" "690,000" "70,000"
[71] "700,000" "710,000" "720,000" "730,000" "740,000" "750,000" "760,000" "780,000" "80,000" "800,000"
[81] "90,000"

VarunKShetty commented 6 years ago

It's the commas that's creating the problem. Try the top answer here: https://stackoverflow.com/a/28129746

Use gsub.

Sent from my mobile. Please excuse brevity and typos.

On May 21, 2018 23:56, allyper notifications@github.com wrote:

levels(CCdata$LIMIT_BAL) [1] "1,000,000" "10,000" "100,000" "110,000" "120,000" "130,000" "140,000" "150,000" "16,000" "160,000" [11] "170,000" "180,000" "190,000" "20,000" "200,000" "210,000" "220,000" "230,000" "240,000" "250,000" [21] "260,000" "270,000" "280,000" "290,000" "30,000" "300,000" "310,000" "320,000" "327,680" "330,000" [31] "340,000" "350,000" "360,000" "370,000" "380,000" "390,000" "40,000" "400,000" "410,000" "420,000" [41] "430,000" "440,000" "450,000" "460,000" "470,000" "480,000" "490,000" "50,000" "500,000" "510,000" [51] "520,000" "530,000" "540,000" "550,000" "560,000" "570,000" "580,000" "590,000" "60,000" "600,000" [61] "610,000" "620,000" "630,000" "640,000" "650,000" "660,000" "670,000" "680,000" "690,000" "70,000" [71] "700,000" "710,000" "720,000" "730,000" "740,000" "750,000" "760,000" "780,000" "80,000" "800,000" [81] "90,000"

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/InseadDataAnalytics/INSEADAnalytics/issues/130#issuecomment-390795114, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALZ2PPm2FKL4u07vRYM4InXCFtJaNPDoks5t0zf4gaJpZM4UGOLK.

allyper commented 6 years ago

Thanks! Will try that

allyper commented 6 years ago

Here is what worked finally, if anyone else had this issue

Fix LIMIT_BAL comma issue

CCdata$LIMIT_BAL <- (gsub(",","",CCdata$LIMIT_BAL)) #removes commas but converts them to character datatype CCdata$LIMIT_BAL <- as.numeric(CCdata$LIMIT_BAL) #converts data to numeric datatype