Closed yaaminiv closed 6 years ago
For merge
to function properly, don't you have to specify the by
portion of the function like so:
by.x="column_name", by.y="column_name"
Not sure if you need both a by.x
and a by.y
, so you could maybe just use one or the other?
I do specify the by
portion! Just got cutoff in Laura's screenshots.
Here's my code (found in this R script):
masterSRMDataBiologicalReplicates <- merge(x = masterSRMData, y = biologicalReplicates, by = "Sample.Number")
Where "Sample.Number" refers to the column name.
My understanding is that you're usage is incorrect. It should be:
masterSRMDataBiologicalReplicates <- merge(x = masterSRMData, y = biologicalReplicates, by.y = "Sample.Number")
Here's my understanding:
From R Documentation:
By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y. The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each. For the precise meaning of ‘match’, see match.
So by would be the general case, and would still work (that's how I was taught in R class!)
Either way, I don't think this argument is what's causing Laura's issues?
OK, there's something weird going on here. When I follow @yaaminiv link to her script that she provided above, it's not the same script that @laurahspencer's using.
For example, see Line 25 of @yaaminiv's script (quick screen cap below) and compare that to line 25 in @laurahspencer's screenshots above - not the same!
Am I missing something or is there confusion on which script (or version of the script) is being evaluated here?
I played around with the script (on my local computer) when working through her protocol to view data frames and fix an error I got so I could move forward with the work flow.
But my changes were basically View(), head(), etc.
I ran through @yaaminiv script just past where @laurahspencer got her error and I get no errors.
RStudio Version 1.0.143.
I used the same R studio version, R version 3.4.0 (2017-04-21) -- "You Stupid Darkness". Is there a potential version issue between our computers and Woodpecker (what Laura's using to reproduce my analyses)?
It must be a Windows thing. Same R version (3.4.0, 2017-04-21) & RStudio version:
I just tried re-downloading all materials & rerunning code on the Windows computer, same error:
I'll also try running my code on the Windows machine too and see if I encounter errors.
Adding support to @laurahspencer's Windows experience.
I also get this error when running on Windows 7 (R v3.4.2; RStudio 1.1.383).
I thought I had this figured out, but not dice. However, here's some insight into what's causing the issue. @laurahspencer actually alluded to this in her screen caps, but I'm not sure if she was highlighting the actual problem or just the column name. Anyway, the cause of the issue is a weird character set inserted in the "Sample.Name" column in the 2017-09-06-Biological-Replicate-Information.csv file:
Additionally, when I try to preview that file using the head
command using Git Bash, it only displays the very last line of the file:
When I view this in the text editor that I use (Notepad++), with "view all characters", I don't see any weird characters or anything, but I did notice that the last line of the file does NOT have a carriage return after it:
I think a "valid" text file has to end with a newline (which might not be the same as a carriage return?), so maybe this is the issue? Will investigate a bit more.
OK, here's the immediate fix to this specific issue. Line 22 should be:
biologicalReplicates <- read.csv("2017-09-06-Biological-Replicate-Information.csv", na.strings = "N/A", fileEncoding="UTF-8-BOM")
Specifying the file encoding as UTF-8-BOM is needed for this particular file.
However, there is a bigger issue here; how did this file get this way? The answer most likely lies in the CSV being generated by Excel for Mac. I think you have to make sure that the Format selected when saving as CSV is "Windows Comma Separated (.csv)". This should ensure cross-platform functionality.
@yaaminiv please test this when you have the chance and report back.
I'll make the edit to the code ASAP. I'm not working on a Windows right now but I think @laurahspencer is and could see if this fixes our issues?
And good to know about the Windows Comma Separated .csv! I'll make adjustments.
To clarify, I know the code change fixes the issue - I tested it on Windows.
We'll need you to test out the change in file saving procedure and see if the "old" code works with the file when saved using the "Windows Comma Separated (.csv)" option.
Oh, gotcha! I was able to run through my entire script with no errors.
From #19
@laurahspencer: Error merging biological data to abundance data; possibly b/c the column name in biol. data set isn't exactly "Sample.Number"
After changing that column's name to "Sample.Number" I executed the merge, but dataframe is empty; I think it's b/c your X db doesn't include sample numbers w/o replicates:
Tried on my computer, worked just fine. Could be an issue with different computers? @sr320 and @grace-ac will test.