Closed bcjaeger closed 1 year ago
HI @bcjaeger ,
I have two questions:
How do you run the gsutil cp gs://motrpac-data-hub/human-precovid/results/transcriptomics/qa-qc/motrpac_human-precovid_transcript-rna-seq_qa-qc-metrics.csv .
command locally on Windows? i.e., how would you cp that file on your windows system using gsutil
?
Have you tried the following command? if so, does it work?
tmp <- dl_read_gcp(path = "gs://motrpac-data-hub/human-precovid/results/transcriptomics/qa-qc/motrpac_human-precovid_transcript-rna-seq_qa-qc-metrics.csv",
tmpdir = "D:/hap-p-sed-modeling/data/sensitive")
Thanks!
How do you run the gsutil cp gs://motrpac-data-hub/human-precovid/results/transcriptomics/qa-qc/motrpac_human-precovid_transcript-rna-seq_qa-qc-metrics.csv . command locally on Windows? i.e., how would you cp that file on your windows system using gsutil?
If not using R, I would use the command prompt:
If using R, just calling system()
with the corresponding terminal code works fine.
Have you tried the following command? if so, does it work?
Thanks for checking. The same two results occur with this code. I.e., it runs fine when I do a reprex()
library(MotrpacBicQC)
tmp <- dl_read_gcp(path = "gs://motrpac-data-hub/human-precovid/results/transcriptomics/qa-qc/motrpac_human-precovid_transcript-rna-seq_qa-qc-metrics.csv",
tmpdir = "D:/hap-p-sed-modeling/data/sensitive")
#> Warning in system(sprintf("mkdir -p %s", tmpdir)): 'mkdir' not found
Created on 2023-05-08 with reprex v2.0.2
The warning about mkdir is just a windows thing I think, and doesn't cause any problems since the directory I'm specifying already exists. But then here is the funny part - it doesn't work if I run it in my local R session
Sorry this issue is so strange. I am afraid it is more of a Windows issue than it is a MotrpacBicQC
issue.
It is pretty straightforward to quickly upload the data using command line (DOS window) if you have gsutils installed. The command for uploading the whole folder recursively looks something like that: gsutil -m cp -r "Y:\DataAnalysis_Reports\EX00979 - PASS 1B\PASS1B-06*" "gs://motrpac-portal-transfer-michigan/PASS1B-06"
To download you just switch the source and destination
To download you just switch the source and destination
Thank you! I am able to download the files I need, which is great. Even though I can get those files, it may be helpful to update dl_read_gcp
so that it works consistently on Windows and continues to work fine on other systems too. (I am trying to research why system
has unexpected errors on Windows)
Thanks @bcjaeger
Question 1) was about how you do it outside of R. Does this command work on your side?
gsutil cp gs://motrpac-data-hub/human-precovid/results/transcriptomics/qa-qc/motrpac_human-precovid_transcript-rna-seq_qa-qc-metrics.csv D:/hap-p-sed-modeling/data/sensitive
Please, confirm (make sure you don't have the motrpac_human-precovid_transcript-rna-seq_qa-qc-metrics.csv
file in that directory already)
However, after seeing this warning that you provided in your response:
Warning in system(sprintf("mkdir -p %s", tmpdir)): 'mkdir' not found
I am afraid that could be the issue. For gsutil
we don't have any other option but calling system
in R. However, we should use R's dir.create()
command to create a directory (and the OS won't matter)
So please, confirm the answer to the first question.
Thanks - confirmed.
I am afraid that could be the issue. For gsutil we don't have any other option but calling system in R. However, we should use R's dir.create() command to create a directory (and the OS won't matter)
mkdir could be an issue if the user was hoping to create the directory where they wanted to download data, and using dir.create()
seems like a great idea.
I have a fix that should only change how this function works on Windows. It takes the ignore
arguments and sets them to FALSE
if the operating system is Windows. I think the issue I'm having with dl_read_gcp()
is hard to explain - I would be happy to share screen over zoom to clarify if that's helpful.
dl_read_gcp <-
function (path, sep = "\t", header = TRUE, tmpdir = "/tmp",
gsutil_path = "gsutil", check_first = TRUE, ...){
# additions from bcjaeger:
sys_name <- Sys.info()['sysname']
ignore_stdout <- ignore_stderr <- sys_name != "Windows"
system(sprintf("mkdir -p %s", tmpdir))
new_path <- sprintf("%s/%s", tmpdir, basename(path))
if (check_first) {
if (!file.exists(new_path)) {
cmd <- sprintf("%s cp %s %s", gsutil_path,
path, tmpdir)
system(cmd,
ignore.stdout = ignore_stdout,
ignore.stderr = ignore_stderr)
}
else {
message(paste("The file", new_path, "already exists"))
}
}
else {
message(paste("Downloading file from GCP: ", basename(path)))
cmd <- sprintf("%s cp %s %s", gsutil_path, path,
tmpdir)
system(cmd,
ignore.stdout = ignore_stdout,
ignore.stderr = ignore_stderr)
}
if (file.exists(new_path)) {
dt <- data.table::fread(new_path, sep = sep, header = header,
...)
return(dt)
}
warning(sprintf("gsutil file %s does not exist.\n",
path))
return()
}
Thanks, I'll get back to you with my proposed fix
If you'd like, I can propose a fix with a PR? It could include my Windows code above and the use of dir.create()
as you noted above. I don't want to take up too much of your time with my weird OS problems =]
Fixed with #202 and #203 🎉
Hello,
Thank you for making this R package, it is very helpful.
I have access to the motrpac data hub and have also installed gsutils. However, I run into an issue when I try to use
dl_read_gcp()
locally on a Windows OS.Strangely, it runs fine in
reprex()
(I found this out when I tried to make a reprex for this issue) but when I run it in a local R session on Rstudio I get something like this:@sawyerWeld, @joerigdon, @cgsimmons822, @SHChen0, and @fchsu6 have helped me troubleshoot this. It's possible that there is something happening with
system
that could be fixed by usingsystem2
, but we haven't figured out what yet.We did find that the
dl_read_gcp()
function runs locally for me when I set eitherignore.stdout
ORignore.stderr
toFALSE
in thesystem
call here:But I have no idea why that fixes the issue.