Closed shrektan closed 9 years ago
Hi, I have never use bcp for reading. But I admit I should add it the TODO task? Maybe I should add , I will add a new function ( any help is welcome)
dbBulckRead
I will add it once I finish to integrate the mono version ( in the coming days).
@agstudy Thanks for your reply. Below is my version of implementation the BCP read, just a raw version, but I hope it could be helpful.
By the way, I'm also interested in your plan of MONO
. It will enable people to connect the MSSQL in a Linux machine, right? But how about the speed? Will it be as fast as in the Windows? I have to confess I really know nothing of Mono
.
The reason that I want to know is that the Shiny Server
of Rstudio can only be run in a Linux machine and via the JDBC
is kind of slow in terms of speed. So, if your method is very good in speed, then it will be great for all of us using MSSQL.
I think the dplyr
package is cool especially the part related to use R syntax to SQL. It achieves a kind of seamless way of analysis in R and I'm really interested in it. However, the dplyr
doesn't contain the part related to MSSQLSERVER
because Hadley only wants to work for the open source SQL database.
So, maybe we can consider to implement the generic functions for sqlserver, so that people can use dplyr
package directly for analysis in MSSQLSERVER
?
f_bcp_query <- function(conn, sql, if_str = FALSE, if_keep_tmp = FALSE) {
if (if_str == FALSE) {
conn <- as.character(substitute(conn))
}
# prepare temporary file and folder
tempdir <- tempdir()
tempfile <- tempfile(tmpdir = tempdir, fileext = ".csv")
if (if_keep_tmp) f_msg("Will storage the tmp file in ", tempfile)
# download data and header via BCP function
f_bcp <- function(sql, csv_name = tempfile, if_echo = FALSE,
hdr = TRUE, if_debug = FALSE) {
stopifnot(str_right(csv_name, 4) == ".csv")
tmp <- f_conn_info(conn)
user = tmp$UID
pwd = tmp$PWD
db = tmp$SERVER
if (hdr) {
conn <- activate_conn(conn, if_str = TRUE)
tmp <- sqlQuery(
conn,
paste0("select top 0 * from (", sql, ") a")
)
tmp2 <- gsub(".csv", "_hdr.csv", csv_name, fixed = TRUE)
write.csv(tmp, tmp2, row.names = FALSE)
odbcClose(conn)
}
str_cmd <-
'bcp "%s" queryout "%s" -U "%s" -P "%s"
-S "%s" -c -t "|" -b 10000 -a 10000'
str_cmd <- sprintf(str_cmd, sql, csv_name, user, pwd, db)
str_cmd <- gsub("\n", " ", str_cmd, fixed = TRUE)
# f_msg(str_cmd)
tmp <- invisible(shell(str_cmd, intern = !if_echo))
if(if_debug) {
return(str_cmd)
} else {
return(tmp)
}
}
# use fread to read data
f_fread <- function(csv_name = tempfile) {
csv_hdr <- gsub(".csv", "_hdr.csv", csv_name, fixed = TRUE)
if (!if_keep_tmp) {
on.exit({
unlink(csv_name)
unlink(csv_hdr)
})
}
tmp <- data.table::fread(
csv_name, sep = "|", header = FALSE, integer64 = "double"
)
tmp_hdr <- read.csv(csv_hdr)
data.table::setnames(tmp, colnames(tmp), colnames(tmp_hdr))
tmp
}
f_bcp <- f_bcp(sql)
return(f_fread())
}
Many thanks!
1- Thank you for you suggestion.
2- Of course the mono integration will keep the same speed ( at least will be faster than other providers ) and you can access the Sql Server from Linux machine.
3- dplyr
is higher interface based on DBI to get rid of sql specific query. rsqlserver
is fully DBI compliant so I believe that you can already create this interface. Personally I don't use dplyr. I have no problem to use DBI interface directly. ( Myabe my SQL background).
4- Thank for your f_bcp_query
sample , can you please create a pull request with a test case for this. I will integrate it as soon as possible.
Thanks.
dplyr
backend. And if it becomes matured and you're interested, I hope I can create PR then.
Hi, just find your package and personally I think it's very useful and interesting.
After reading your introduction, I realized that you've used
.Net framework
to read and write, and provide a function to useBCP
to write table.So, I'm just wondering why not just use
BCP
to both read and write at all and remove thesqlClient
's part.I have to admit that I'm not very familiar with the
sqlServer
stuff in detail, however, I've built two small function in my working environment throughBCP
to both read and write. For example aboutQuery
, it will first useBCP
to write a csv file in my temp folder, then it will usedata.table::fread
to read the csv file as fast as possible. In usually circumstances it seems work ok and very fast, except that if there's too many text the csv file might be failed to read (like containing the seperator in the data).So, I just want to figure out the reason why not use
BCP
for everything.Thanks.