Closed Rafnuss closed 2 years ago
Yes, you're correct. Right now wf_request_batch()
submits workers
number of requests and downloads each one when finished. How would you view only staging requests working
My ideal code would look something like:
requests = wf_request_batch(request_list, transfer=F)
# some time later...
wf_transfer(requests)
instead of
for (i_req in seq_len(request_list)){
requests[i_req] = wf_request(request_list[i_req], transfer=F)
}
# some time later...
for (i_req in seq_len(requests)){
wf_transfer(requests[i_req])
}
But maybe this is quite a specific need that nobody else share...
So you're requesting all the requests at once and then downloading when they are done? The added value of wf_request_batch()
is the build-in queue to ensure that you're only sending a maximum number of request so they are not queued on the server end.
For your usecase I'd use req <- lapply(request_list, wf_request, transfer = FALSE)
and then lapply(req, wf_transfer)
.
@khufkens what do you think?
Correct, assuming that you don't exceed the maximum number of allowed parallel requests.
The recent work of @eliocamp explicitly addresses the latter, monitoring the queue to download and submit new requests as slots free up. So as long as you colour within the lines the proposed fix (above) should work.
Ok, thanks for your answers. lapply()
is a much more consise version of my suggestion indeed.
In my case I could have up to 100+ requests to make (of very small files). This is only done once in the overall process of my code. So, my thinking would be to make all requests at once and wait a couple of hours and then download them all.
I've been using wf_request_batch()
for cases with a few requests (<30), but I thought that is would be nice to have the R console free to do other think while waiting for the case with more requests. What do you think?
Just submit it as a job! Either in a separate terminal (if you are using no IDE, or using the job interface in RStudio). I mostly let jobs like this run in the background in RStudio, or when using an HPC they run as proper job in the HPC queue.
But yes, best to download everything in one pass if you don't need dynamic access
For reference:
Ok, yes, sounds like a good plan. I'm not super familiar with jobs. But for my case, would you be using the job_name in wf_request(. , transfer=T, job_name="test")
or write a script with wf_request_batch()
and start it with rstudioapi::jobRunScript()
?
That's effectively the same thing.
I often call things from within RStudio itself as I often lump in some post/pre-processing.
Bear in mind that lapply(list, wf_request, job_name = "test")
wont' really work, as it will create 100+ jobs with the same name. I think a better alternative for your use case might be to write a small script with lapply
and wf_request
and then run the script as a job or even run it in a different R session on the console.
Sounds good. Maybe I'll keep wf_request_batch()
in my function (standard case should be 10-30 requests), and then call this function as a job with https://github.com/lindeloev/job/ in the case that there are more requests. Thanks for your help!
Ok, I'll close this now.
btw. @Rafnuss nice work with the pressure based geolocation work.
Is is possible to have batch request without transfer?
The documentation for
wf_request()
andwf_request_batch()
reads:But as I understand it,
wf_request_batch()
doesn't have an option to just stage the request. Is this correct? Or am I missing something?