InseadDataAnalytics / INSEADAnalytics

Other
122 stars 1.31k forks source link

Issues using replace() #124

Open jerepow opened 6 years ago

jerepow commented 6 years ago

Trying to replace the integers 20, 30.... etc. to factors and then have an SC at the beginning.

  1. I'm guessing there's a way to do this in one line
  2. I've tried several different replace methods but it keeps returning the same data points that go in

Fixing incorrectly classified data types and renaming data points from integer codes to make more sense:

train.house$MSSubclass <- as.factor(train.house$MSSubClass) replace(train.house$MSSubClass, c(20, 30, 40, 45, 50 ,60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190), c("SC20", "SC30", "SC40", "SC45", "SC50", "SC60", "SC70", "SC75", "SC80", "SC85", "SC90", "SC120", "SC150", "SC160", "SC180", "SC190") )

VarunKShetty commented 6 years ago

@jerepow Adding SC at the beginning should be done when your column is a "character" and not when it is a "factor". Once you have declared a column as a factor, R goes through a process of learning the list of unique categories in that column (called "levels") and then becomes stubborn when you try to change what it learnt the levels to be. So declare it as a character first, do the replacement and then declare as a factor.

jerepow commented 6 years ago

Thanks Varun. Still having issues unfortunately.

  1. When I reclass as a character, it adds a new column rather than reclassifies. I've got around this by just creating a new variable and nulling out the existing.
  2. It still isn't changing the string:

train.house$MSSubclass1 <- as.character(train.house$MSSubClass) train.house$MSSubClass = NULL replace(train.house$MSSubClass1, c("20", "30", "40", "45", "50" ,"60", "70", "75", "80", "85", "90", "120", "150", "160", "180", "190") ,c("1New", "1Old", "1Attic", "1.5Unfin", "1.5Fin", "2New", "2Old", "2.5", "Split", "Split Foyer", "Duplex", "1UnitNew", "1.5UnitNew", "2UnitNew", "UnitMulti", "2FamConv") )

Can you see any issues I'm missing?

jerepow commented 6 years ago

Interestingly, it seems to be matching them up, but maybe there's a quirk in R that the output is going somewhere else?

image

VarunKShetty commented 6 years ago

Let's try the following:

  1. First reclass it as a character using train.house = train.house %>% mutate(MSSubClass = as.character(MSSubClass)
  2. Then let's add the prefix string using train.house = train.house %>% mutate(MSSubClass = paste0("SC",MSSubClass))
  3. Then let's call it a factor again, if necessary: train.house = train.house %>% mutate(MSSubClass =as.factor(MSSubClass)

Typing from my phone, so commands might need a little fidgeting.

VarunKShetty commented 6 years ago

Just eyeballing the commands you used already earlier, it looks like you are not actually assigning the replacement to any variable. So it just ends up printing the replacement. Try this:

train.house$MSSubClass1 = replace(train.house$MSSubClass1, c("20", "30", "40", "45", "50" ,"60", "70", "75", "80", "85", "90", "120", "150", "160", "180", "190") ,c("1New", "1Old", "1Attic", "1.5Unfin", "1.5Fin", "2New", "2Old", "2.5", "Split", "Split Foyer", "Duplex", "1UnitNew", "1.5UnitNew", "2UnitNew", "UnitMulti", "2FamConv") )

I basically just added a text at the beginning telling R where the replacement must be assigned to. Does that make sense?

jerepow commented 6 years ago

Well done doing that on your phone but still having issues:

image

and trying the mutate method is returning stmbol error issues:

image

VarunKShetty commented 6 years ago

For the second image, you are missing a closing bracket on the last two commands. In classifying as character. And also in the paste0 one.

Sent from my mobile. Please excuse brevity and typos.

On May 21, 2018 17:39, Jeremy Pownall notifications@github.com wrote:

Well done doing that on your phone but still having issues:

[image]https://user-images.githubusercontent.com/29014259/40316133-a32c6c5a-5d1d-11e8-9804-a9b9b947ddfa.png

and trying the mutate method is returning stmbol error issues:

[image]https://user-images.githubusercontent.com/29014259/40316191-cf858fca-5d1d-11e8-9026-a6051983e011.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/InseadDataAnalytics/INSEADAnalytics/issues/124#issuecomment-390692353, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALZ2PC36Ng-pUk7oJtgwlzLoqRKxkIBWks5t0t-bgaJpZM4UFtPs.