Open John-Paterson opened 5 years ago
Hey John, I'm certainly no expert, but to tackle the first desired merge have you tried a simple merge to create a new df like this:
new_df <- merge(percentilesdf, climate.data1,by="year")
resulting in a merged dataframe which you can then drop any columns you don't need. That should pull in all the columns from climate.data1 (including cum.degrees
) into the new df.
For the second bit, I'm not sure what you mean by wanting to match '5' and 'JDay'. With the above merge, both of those columns should be in the new df anyway (I think!?)
Have you looked into left_join() from dplyer? There's a bunch of handy _join() functions in dplyer, and the help page is actually quite helpful!
Matt
Thanks Matt,
My issue is primarily that I'm not looking to just join the dataframes together (they're different lengths anyway), I'm wanting to pull a column value from the second one based on whether values from two other columns match those in the original dataframe. Basically analogous to a Vlookup in Excel, if that means anything to anyone!
When I said I wanted to match '5' and 'JDay' I meant that I only wanted to merge the 'cum.degrees' column from the second dataframe if both the 'Years' columns matched across the dataframes AND the value in the '5' and 'JDay' columns match. It would look something like this:
>DF1
Species Year 5 50 95 ModelYear
1 Baccha elongata. 1981 117 191 305 1
2 Baccha elongata. 1995 101 151 173 1
3 Cheilosia pagana. 1995 117 199 235 1
4 Cheilosia pagana. 1995 101 199 235 1
>DF2
date Day Month Year JDay cum.degrees
1 18/04/1981 18 4 1981 95 59.48
2 12/05/1981 12 5 1981 117 104.71
3 12/05/1995 12 5 1995 117 86.1
4 18/04/1995 18 4 1995 95 43.23
5 19/04/1995 19 4 1995 96 46.00
6 26/04/1995 26 4 1995 101 54.44
>MergedDFs
Species Year 5 50 95 ModelYear cum.degrees
1 Baccha elongata. 1981 117 191 305 1 104.71
2 Baccha elongata. 1995 101 151 173 1 54.44
3 Cheilosia pagana. 1995 117 199 235 1 86.1
4 Cheilosia pagana. 1995 101 199 235 1 54.44
Don't know if that makes it clearer? I have had a look through the dplyr page but will check it out again!
Cheers,
John
So I've managed to circumvent this problem by creating a column in each dataframe containing a unique reference combining both Year and '5' (in the first dataframe) and 'JDay' (in the second dataframe). I've then merged them based on that.
Still don't really know why the original code didn't work though so if anyone's got any answers it'd be good to know for future reference.
I think left_join would do it - not sure if this code would exactly work - but something along these lines probably would. I think in recent updates dplyr have added even more join options - could be a one stop shop for all your joining needs :)
left_join(df1, df2, by = Year, c("JDay" = "5"))
Thanks Anna, just had to change the code a wee bit to:
left_join(df1, df2, by = c("Year", "JDay" = "5"))
And it worked a treat. With 'Year' outside the brackets it didn't look for both columns to match, it just matched them based on Year. I should have listened to Matt - Tidyverse is always easier!
Thanks all.
@jejoenje showed me another quite snazzy method for doing a similar thing to what you wanted. I know you've solved your issue but this may be of interest/use to people in the future. The code below won't be bang on as I'm not sure how you'd do it when you need to match more than one column, but you'll get the idea
percentilesdf$cum.degrees <- climate.data1$cum.degrees[match(percentilesdf$year, climate.data1$year)]
Hi - I'm trying to pull data from one dataframe and append it to another based on matching values in multiple other columns. Currently I'm trying to achieve this through the merge function. My first dataframe looks like this:
And my second looks like this:
I'm trying to pull the cum.degrees value from the second (climate.data1) df and and join it with the first (percentilesdf) df based on matching values in the Year column (which is shared between both dfs) and matching the '5' column in percentilesdf to the 'JDay' column in climate.data1.
Currently I am trying to do this using a merge function that specifies which columns I want to match from each df using by.x and by.y and which specifies that I want the "cum.degrees" column merged when defining y. Code is as follows:
merge(x = percentilesdf, y = climate.data1[ , "cum.degrees"], by.x = c('Year', '5'), by.y=c('Year', 'JDay'), all.x=TRUE)
However I keep getting error messages saying
Error in fix.by(by.x, x) : 'by' must match numbers of columns
or that 'by' must include unique columns. I have looked extensively on the internet for a solution but can't find anything that works.
Am I along the right lines with this code or is there a different way to achieve this?
Any help appreciated as I've spent more time than I care to admit trying to sort this out!