Open mattnuttall00 opened 4 years ago
Hi Matt,
I think this does want you want it to do?
df1 <- data.frame(year = rep(c(2007,2008,2009,2010,2011), each=2),
comm_name = c("a","b",NA,"d","e",NA,"g","h",NA,"j"),
code_com = c(1:10))
df2 <- data.frame(comCode = c(1:10), comm_name = letters[1:10])
df1$comm_name = df2$comm_name[match(df1$code_com, df2$comCode)]
df1
Basically, match()
returns indexes where it found a match between vector A and B, so you need to use the output of that function as an "indexer" for where you want to place the values.
It's a bit odd to get your head around, but have a look at what the output of match(df1$code_com, df2$comCode)
is.
Hope that helps...
J
Thanks @jejoenje. Am I correct in thinking though that
df1$comm_name = df2$comm_name[match(df1$code_com, df2$comCode)]
replaces all of the df1$comm_name
values with the values from df2$comm_name
? How would I then specify that I only wanted it do do that when df1$comm_name
values are NA
?
Ah, yes, it does indeed replace all of them. In the example case, though, that doesn't matter, as your "index" (i.e. df2
) has all the necessary values. I assume that for your "real" case, that isn't the case (perhaps only the missing cases)?
The only way I could think of doing this in that case, would be to effectively subset df1
into two, one where df1$comm_Name == NA
, and one where df1$comm_Name != NA
.
Then do the above match using the former subset only, and re-combine the two subsets. Watch your factor levels (if applicable) in that case, though.
The result, though, for the given example would be identical... it would be far easier to just ensure that your "index" (df2
) just has all the cases you're trying to match?
... or is my cold-addled brain missing something?
All else fails: for
loop and some ifelse statements. But that would, of course, be cheating. :)
I think using a combination of left_join() and coalesce() would do what you want - you get two versions of the col 'comm_name' with the join, then coalesce (merge them together) to get a new col with NA's replaced, and then delete the extra cols you don't need. It didn't work with factors though. Does that help?
df1$comm_name <- as.character(df1$comm_name) df2$comm_name <- as.character(df2$comm_name)
df3 <- left_join(df1, df2, by = c('code_com' = 'comCode')) %>% mutate(comm_name = coalesce(comm_name.x, comm_name.y)) %>% select(-comm_name.x, -comm_name.y)
Hey @jejoenje and @anna-deasey - both of those methods work. Thanks very much!
Hi all,
What I am trying to do is quite simple, and I thought I knew how to do it, but it's not working...!
I have two dataframes. Below are reproducible dataframes which demonstrate the example:
df1 <- data.frame(year = rep(c(2007,2008,2009,2010,2011), each=2), comm_name = c("a","b",NA,"d","e",NA,"g","h",NA,"j"), code_com = c(1:10))
df2 <- data.frame(comCode = c(1:10), comm_name = letters[1:10])
All I am trying to do is to replace the NAs in df1$comm_name with the value from df2$comm_name, based on the com_code. i.e.
code_com
indf1
andcomCode
indf2
are the same variable, so I want to use those values to match the the correct comm_name fromdf2
todf1
.I have tried:
df1$comm_name[is.na(df1$comm_name)] <df2$comm_name[match(df1$code_com,df2$comCode)]
and
df1$comm_name <- ifelse(is.na(df1$comm_name), df2$comm_name[match(df2$comCode,df1$code_com)], df1$comm_name)
and
df1$comm_name <- replace(df1$comm_name, is.na(df1$comm_name), df2$comm_name[match(df1$code_com,df2$comCode)])
and
df1 <- df1 %>% mutate(comm_name = ifelse(is.na(df1$comm_name), df2$comm_name[match(df1$code_com,df2$comCode)], comm_name))
and
df1 <- df1 %>% mutate(comm_name = ifelse(is.na(df1$comm_name), replace(comm_name, is.na(comm_name), df2$comm_name[match(df1$code_com,df2$comCode)]), comm_name))
And none of them are doing what I want. Not sure if I am over-complicating it, or if I am just making some silly mistake in my code....but all suggestions welcome!
Thanks