Closed laurencehendry closed 8 years ago
Hi
Is there anyway you could link to some of the underlying data you are trying to merge?
One quick thing: rbind
and cbind
are functions in base R, so you don't need to install anything. bind_rows
is a function in dplyr. So just install that package.
Hey
Thanks (also for replying on a Sunday!) The underlying data can be found here: https://drive.google.com/folderview?id=0B9VhVEv3LS_gemVMQXFQX21vNTg&usp=sharing
I've updated my script and also included a friend's code that I was using as an example.
Could you possibly upload these in one zipped folder?
On Sun, Nov 8, 2015 at 5:23 PM laurencehendry notifications@github.com wrote:
Hey
Thanks (also for replying on a Sunday!) The underlying data can be found here:
https://drive.google.com/folderview?id=0B9VhVEv3LS_gemVMQXFQX21vNTg&usp=sharing
I've updated my script and also included a friend's code that I was using as an example.
— Reply to this email directly or view it on GitHub https://github.com/HertieDataScience/SyllabusAndLectures/issues/59#issuecomment-154842378 .
Try this code for binding all of your data together. (I'm not sure what the variable names are so you should put those in after running this:
library(rio)
library(dplyr)
library(DataCombine)
# Set directory to where the files are stored
setwd('~/Desktop/European_Firms/')
# Find a list of all files in the directory
all_files <- list.files()
# Create a NULL object to combine the individual files into
combined <- NULL
for (i in all_files) {
# Load and combine only if the file is a csv
if (tools::file_ext(i) == 'csv') {
# Read file
temp <- import(i)
# Create file ID from the file name and move to the front
file_id <- gsub(pattern = '.csv', replacement = '', i)
message(file_id)
temp$file_id <- file_id
temp <- MoveFront(temp, 'file_id')
# Bind to the main data frame
combined <- bind_rows(combined, temp)
} else {
message('-- Not CSV --')
}
}
Can you give an example of what you want to do for the third point?
Many, many thanks!
Re third point: Included in the .zip file is an excel sheet ("Unique Identifiers") that contains variables on bank location and trading status. With VLOOKUP in excel (unavailable to us since the dataset will be 1m+ values) I would've run a function to look up the bank name amongst the Unique Identifiers and then return the country name and bank status in an adjacent column.
We also thought the function would make adding future variables simpler/more efficient (by adding e.g. bank characteristics to the Unique Identifiers, then running the VLOOKUP function again in our master dataset)
why not just merge
this information into the main data frame based on the file_id
?
Gah, still thinking liking an excel-user. I will use the merge function. Thanks again
I had to do a very similar thing for my Master thesis. Solved it the following way. It works as long as you need all csv in a folder.
temp = list.files(pattern="*.csv") reviews = do.call("rbind", lapply(temp, function(x) read.csv(x, stringsAsFactors = FALSE)))
Dear classmates,
My research partner Lukas and I have been trying to manage a large number of datasets needed for quantitative analysis for our thesis. We have begun experimenting with the Microsoft Excel Macros tool, since we did not have much success with R:
https://github.com/laurencehendry/SRISK_Thesis/blob/master/Gather
Challenges: For 413 datasets (all bank data; all with identical variables/columns; all in .CSV format) we are trying to:
Issues:
Many, many thanks in advance for any advice or consideration, Laurence