Closed renkun-ken closed 6 years ago
Hi @renkun-ken, thanks for reporting that!
If I understand correctly you have a web interface to RStudio Server and the actual R
session is running remotely on the server.
So what happens exactly when you refresh the web-page, the servers starts up a completely new R` session and kills the previous one?
The only way that the number of cores would be set to 1 would be if fst
can't detect OpenMP
when re-entering. That would be strange but can be tested with:
fst:::hasopenmp() # TRUE if OpenMP detected
#> [1] TRUE
would you be so kind to test that?
The other reason for the number of threads to be set to 1 is when fst
thinks it's in a forked session. The logic used there is comparable to that used in the data.table
package. Would it be possible to test if data.table
has the same problem using:
data.table::getDTthreads()
#> [1] 8
Thanks!
I do some tests with both fst::threads_fst()
and data.table::getDTthreads()
and it seems that RStudio Server re-entered R session may be a forked one. Here's my test code:
while (TRUE) {
cat("[", format(Sys.time()), "] fst::threads_fst() = ", fst::threads_fst(),
", data.table::getDTthreads() = ", data.table::getDTthreads(), "\n", sep = "")
Sys.sleep(1)
}
On 21:58:00
I close the webpage. A while later I re-enter the session and see the logging:
[2017-12-08 21:58:00] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:01] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:02] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:03] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:04] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:05] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:06] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:07] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:08] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:09] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:10] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:11] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:12] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:13] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:14] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:15] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:16] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:17] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:18] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:19] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:20] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:21] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:22] fst::threads_fst() = 40, data.table::getDTthreads() = 40
[2017-12-08 21:58:23] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:24] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:25] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:26] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:27] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:28] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:29] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:30] fst::threads_fst() = 1, data.table::getDTthreads() = 1
[2017-12-08 21:58:31] fst::threads_fst() = 1, data.table::getDTthreads() = 1
It's quite clear that the R session is not suspended but the moment I re-enter the session at 21:58:23
I may have entered a forked session so that the threads decreased to 1.
I'm not sure why it behaves in this way. Maybe it's not an issue of fst
and data.table
but this behavior surely makes it less predictive to use RStudio Server with fork-detecting packages. I'll consider raising issues on both data.table and RStudio.
Hi @renkun-ken, that's a smart way of testing that, nice work!
In data.table
's code, there is an explanation why an OpenMP
should not switch back to multi-threaded mode after parallel's fork has completed (that causes problems on the Intel compiler), so it is left to the user to switch to more threads again. I followed that advice for fst
, so therefore we can't really determine from your experiment whether the fork was very brief (perhaps only to facilitate entering) or stays also after the re-entering.
I could add some code to check that or make it the user's choice to switch back to multi-threaded mode after the fork was ended, say:
fst::threads_fst(8, reset_after_fork = TRUE)
#> [1] 8
That would be an option at the users own risk however :-)
Thanks for referring to the data.table's code and clarify. I'd prefer not making it more complex. I'll use threads_fst()
before calling fst functions if I want multi-threading at the moment.
After some intensive use, I prefer adding threads=
to both read_fst
and write_fst
becase it's too easy to let threads fall back to 1 using RStudio Server or calling any mclapply
. @MarcusKlik what do you think?
Hi @renkun-ken, thanks, yes that would be better than setting with fst::threads_fst
every time before you call fst::write_fst
. Especially because fst
also switches back to single threaded mode after some other code or package produces a fork (the user might not even notice as with the RStudio server setup).
Judging from the data.table
issues, we have to switch back to prevent problems in some cases. Perhaps a dual option would be most useful, so when the user does:
# set number of threads to 10
fst::write_fst(dt, "myfile.fst", theads = 10)
that amount of threads is set regardless of any other setting. And with:
fst::write_fst(dt, "myfile.fst")
the default thread behavior is used. That default can be set with:
fst::threads_fst(8, single_threaded_on_fork = TRUE, reset_after_fork = FALSE)
That specifies the threading during and after a fork. Would that be a good option?
thanks
@MarcusKlik, it is definitely a good option. Thanks!
Hi @renkun-ken, with the latest dev
version, the default behavior of fst
after a fork can now be set with parameter reset_after_fork
in threads_fst()
. When reset_after_fork = TRUE
, the number of threads will be restored to the number of active threads before the fork.
On the data.table
repository, some problems have been reported with the Intel compiler when threads are restored after a fork. For those cases, reset_after_fork = FALSE
can be used or the fst_restore_after_fork
option can be set to FALSE
.
I'm very interested to see if this solves your issues with RStudio Server
as well!
Thanks
Hi @renkun-ken, I believe we can close this issue, the default behavior of fst
is now to restore the number of threads to the original setting after a fork has ended.
Please let me know if re-entering a RStudio
session still disables multi threading and I'll re-open.
thanks for testing and submitting the issue to RStudio
!
I'm using the latest development version of
fst
and I find it quite mysterious that after re-entering my RStudio session, the number of threads indicated bythreads_fst()
is changed to 1 from 40.Steps to reproduce:
fst::threads_fst()
which, on my server, returns 40fst::threads_fst()
again and the number of threads becomes 1My session info:
I'm using RStudio Server 1.1.383.