Closed chapmanjacobd closed 5 years ago
Hi, You're not doing anything wrong. I just havent had the time to add the functionality that lets you create one data frame from a folder. That's the next thing on the plan but it's not currently supported
Hi, instead of assigning to the global environment, why don't you assign to a list and apply rbindlist on it?
fread_folder <-
function (directory = NULL, extension = "CSV", sep = "auto",
nrows = -1L, header = "auto", na.strings = "NA", stringsAsFactors = FALSE,
verbose = getOption("datatable.verbose"), skip = 0L, drop = NULL,
colClasses = NULL, integer64 = getOption("datatable.integer64"),
dec = if (sep != ".") "." else ",", check.names = FALSE,
encoding = "unknown", quote = "\"", strip.white = TRUE,
fill = FALSE, blank.lines.skip = FALSE, key = NULL, Names = NULL,
prefix = NULL, showProgress = interactive(), data.table = TRUE)
{
if ("data.table" %in% rownames(installed.packages()) ==
FALSE) {
stop("data.table needed for this function to work. Please install it.",
call. = FALSE)
}
if (is.null(directory)) {
os = Identify.OS()
if (tolower(os) == "windows") {
directory <- utils::choose.dir()
if (tolower(os) == "linux" | tolower(os) == "macosx") {
directory <- choose_dir()
}
}
else {
stop("Please supply a valid local directory")
}
}
directory = paste(gsub(pattern = "\\", "/", directory, fixed = TRUE))
endings = list()
if (tolower(extension) == "txt") {
endings[1] = "*\\.txt$"
}
if (tolower(extension) == "csv") {
endings[1] = "*\\.csv$"
}
if (tolower(extension) == "both") {
endings[1] = "*\\.txt$"
endings[2] = "*\\.csv$"
}
if ((tolower(extension) %in% c("txt", "csv", "both")) ==
FALSE) {
stop("Pleas supply a valid value for 'extension',\n\n allowed values are: 'TXT','CSV','BOTH'.")
}
tempfiles = list()
temppath = list()
tempdf_list = list()
num = 1
for (i in endings) {
temppath = paste(directory, list.files(path = directory,
pattern = i), sep = "/")
tempfiles = list.files(path = directory, pattern = i)
num = num + 1
if (length(temppath) < 1 | length(tempfiles) < 1) {
num = num + 1
} else {
temppath = unlist(temppath)
tempfiles = unlist(tempfiles)
count = 0
for (tbl in temppath) {
count = count + 1
DTname1 = paste0(gsub(directory, "", tbl))
DTname2 = paste0(gsub("/", "", DTname1))
if (!is.null(Names)) {
if ((length(Names) != length(temppath)) |
(class(Names) != "character")) {
stop("Names must a character vector of same length as the files to be read.")
} else {
DTname3 = Names[count]
}
} else {
DTname3 = paste0(gsub(i, "", DTname2))
}
if (!is.null(prefix) && is.character(prefix)) {
DTname4 = paste(prefix, DTname3, sep = "")
} else {
DTname4 = DTname3
}
DTable <- data.table::fread(input = tbl, sep = sep,
nrows = nrows, header = header, na.strings = na.strings,
stringsAsFactors = stringsAsFactors, verbose = verbose,
skip = skip, drop = drop, colClasses = colClasses,
dec = if (sep != ".") "." else ",",
check.names = check.names, encoding = encoding,
quote = quote, strip.white = strip.white,
fill = fill, blank.lines.skip = blank.lines.skip,
key = key, showProgress = showProgress, data.table = data.table)
# assign_to_global <- function(pos = 1) {
# assign(x = DTname4, value = DTable, envir = as.environment(pos))
# }
# assign_to_global()
tempdf_list <- append(tempdf_list, list(DTable))
rm(DTable)
}
}
}
tempdf = data.table::rbindlist(tempdf_list)
if(!data.table) {
tempdf = as.data.frame(tempdf)
}
return(tempdf)
}
@alexfun looks good. Do you want to create a pull request?
@bogind I would be more than happy to submit the function above, however I am not sure whether you had something in mind with the code that assigns a variable name based on the file name in the folder. If you would like, I can add a new parameter combine
taking one of the following values: c("data.frame", "global", "list")
so that
global
preserves existing behaviour.list
returns a named list of the csvs, using the currently used naming convention.data.frame
returns one data frame via rbindlist
.@alexfun The combine
parameter seems logical, I think the regular behavior should be using global
as the value
ok, i will write the code with global
as the default behaviour and submit it to you for review.
After running fread_folder I'm left with a few hundred dataframes in my environment but there is no merged dataframe generated. I'm not sure if it is just the csv files I'm using. Maybe I'm just an edge case. It's the first time I've used easycsv.
I'm not sure what I'm doing wrong. There's no 'error' message
fread_folder(directory = "~/dataprojects/ghsl",extension = "CSV", check.names=T,verbose = T)