Open gernophil opened 6 months ago
Does this also happen when using vanilla R from a console?
Without a reproducible example or more information about your system e.g. RAM/CPUs/OMP etc., there is nothing for us to do.
Reproducible example is nit really possible here since it happens really randomly. Sometimes the complete script runs flawlessly, sometimes it aborts.
I have a
Macbook pro 14 2021
macOS 14.3.1 (Sonoma)
M1pro with 16GB Ram
I use GFortran and OpenMP from here: https://mac.r-project.org/
And I use Xcode 15.2 (might have been 15.1 when compiling data.table
)
I'm actually not sure, which compiler is used for data.table
.
Please try to use R terminal to run the script, it's important to know if this is specific to RStudio. R terminal will also give a better crash diagnosis.
data.table explicitly loaded
what does this mean, as opposed to "data.table loaded"?
That means I have library(data.table)
in the script and data.table
is not loaded by loading any other package (, if that even happens for data.table
at all).
And that mean data.table
is the only package that's loaded via library()
in that script.
Ok, I executed a script 10 times using RScript
in the terminal and did not get an error. With RStudio it happened in the 5th execution. So it might be RStudio related. However, I don't get this error using Rstudio
if I don't use data.table
. So, there seems to be some kind of incompatibility.
I'll test RScript
a little more often since 10 times is not that often.
I executed a script 10 times using RScript in the terminal and did not get an error.
If it really is exclusive to RStudio you'll want to raise an issue on their tracker as well. Of course it's still possible there's a data.table bug that's only exposed by something RStudio does. You might have a look at open bugs mentioning data.table in case they are related: https://github.com/rstudio/rstudio/issues?q=is%3Aissue+%22data.table%22+is%3Aopen
Again without an MRE there's not that much we can do. Some further questions: Does the script consistently fail when running the same command? What's that command doing? How many threads is data.table using? Are you doing any plotting to the RStudio device?
That means I have library(data.table) in the script and data.table is not loaded by loading any other package (, if that even happens for data.table at all).
As an aside, there's some vocabulary that may help here. "load" means loadNamespace()
; the verb (I use and I think is generally accepted) for library()
is "attach". It may be common for data.table to get loaded by other packages (e.g. whenever a package does import(data.table)
or importFrom(data.table, ...)
), but we discourage other packages attaching data.table (which almost always means data.table is in that package's Depends
).
If it really is exclusive to RStudio you'll want to raise an issue on their tracker as well. Of course it's still possible there's a data.table bug that's only exposed by something RStudio does. You might have a look at open bugs mentioning data.table in case they are related: https://github.com/rstudio/rstudio/issues?q=is%3Aissue+%22data.table%22+is%3Aopen
I'll also check back there. However, I never had this error, if data.table
was not attached (never heard that wording, so thanks for the new knowledge). And I use R for work every day heavily. So, if any of my other packages would also cause it, it would almost certainly have happened by now.
Again without an MRE there's not that much we can do. Some further questions: Does the script consistently fail when running the same command? What's that command doing? How many threads is data.table using? Are you doing any plotting to the RStudio device?
I know an MRE would be very helpful, but as I already mentioned in the starting post, it seams totally random. It seems to happen often, when using fread()
, saveRDS()
of a dt
object or variable assignment of a dt
object using the <-
operator (I don't use =
outside of functions, so I can't say anything about it). I am trying to create an MRE, but due to the random nature of this error, this is really hard. It happened at the first execution of this minimal script (taken from here):
library(data.table)
mydata <- fread("https://github.com/arunsrinivasan/satrdays-workshop/raw/master/flights_2014.csv")
, but after that I could execute it 15 times without the error.
That means I have library(data.table) in the script and data.table is not loaded by loading any other package (, if that even happens for data.table at all).
As an aside, there's some vocabulary that may help here. "load" means
loadNamespace()
; the verb (I use and I think is generally accepted) forlibrary()
is "attach". It may be common for data.table to get loaded by other packages (e.g. whenever a package doesimport(data.table)
orimportFrom(data.table, ...)
), but we discourage other packages attaching data.table (which almost always means data.table is in that package'sDepends
).
Thanks again for that knowledge. Don't want to start a fight here, but ?library
states the name of the library()
function is Loading/Attaching and Listing of Packages
and the description says: library
and require
load and attach add-on packages. So, I guess one could say that library(data.table)
loads (and attaches) the data.table
package.
So, I guess one could say that library(data.table) loads (and attaches) the data.table package.
yes, I don't disagree with that. after library(data.table) you'll see both are true: isNamespaceLoaded("data.table") and "package:data.table" %in% search(). you can't attach a package without first loading it. However you can load a package without attaching it. See also the .onLoad vs .onAttach hooks.
Appreciate your efforts on debugging. Understand MRE can be quite difficult in such cases, but it is unfortunately the only way forward.
Have you tried rebuilding and reinstalling data.table
? Which compiler do you use for that? Have you used other packages built with the same toolchain? (since you mentioned data.table
is the first one that crashed your session)
Have you tried reinstalling/update Rstudio?
Does it also happen if you execute a script line by line or only if you source it at once?
When you say
simple assignment of a dt to a variable
does that mean dt
has been calculated before, could have been printed out and now is just assigned to another variable as e.g. dt2 <- dt
?
Have you tried rebuilding and reinstalling
data.table
? Which compiler do you use for that? Have you used other packages built with the same toolchain? (since you mentioneddata.table
is the first one that crashed your session)
I'm using Xcode 15 combined with the libomp.dylib
from here (LLVM 16.0.4) to be able to use multiple threads. I think it was Xcode 15.0 and 15.1, when I compiled it the last times. I recompiled it few times, but it didn't help. Maybe I'll retry with Xcode 15.2 since it's released now.
I will also try the dt
binary to see, if this is the problem
Have you tried reinstalling/update Rstudio?
Yes, there was also a RStudio update during the time, but it didn't help. I could wipe RStudios config folder to reset it and see, if this helps.
Does it also happen if you execute a script line by line or only if you source it at once?
Yes
When you say
simple assignment of a dt to a variable
does that mean
dt
has been calculated before, could have been printed out and now is just assigned to another variable as e.g.dt2 <- dt
?
more like dt2 <- dt1[, c("col2", "col3")]
. So, no complex calculations.
I completely reinstalled RStudio (deleting RStudio.app, ~/.local/share/rstudio, com.rstudio.desktop and ~/.config/rstudio). I even "reinstalled" the libomp.dylib
and then I reinstalled data.table
from source. Let's see, if there's any change. If, not, I'll try the binary version, if this also raises the error. As a last measure, I could reinstall R itself, but since RScript
works, I doubt this is what causes it.
Btw. how does Rscript
handle such a fatal error? Might it be that it's just silently restarted?
Ok, the error does still appear. And unfortunately, I still haven't found a real pattern. Sometimes, I can work and hour without it happening, and sometimes, it keeps popping up every 5 minutes. And still only when using data.table
. It looks like, it happens more frequently, if I work with large data (2.5 Mio rows; the number of columns don't seem to matter that much). So it might be related to memory issues.
I am using an .Renviron
with R_MAX_VSIZE=100Gb
. I needed that 100Gb once, but I'm gonna set it to 50Gb for testing. maybe I just don't have that space for swapping anymore since I only have little above 100Gb free disk space. Any chance that this might be the issue? And it seems to be more often provoked, when working interactively rather then just sourcing a while R script.
I found some old thread that are somewhat similar, even though they are still a little different: https://github.com/Rdatatable/data.table/issues/2672 https://github.com/Rdatatable/data.table/issues/2119
The R_MAX_VSIZE=100Gb
seems to have been the issue here. The error hasn't appeared ever since I reduced it.
The issue reappeared even with R_MAX_VSIZE=50Gb
(with around 90GB free disk space) and also with R_MAX_VSIZE
not set at all. Still it happens randomly and I don't find a reproducible example.
I have the same problem. I also first suspected a memory-related issue but it also happens with small data tables. I also tried a bunch of things but couldn't find a MRE yet.
Same hardware specs as @gernophil (not sure if this is is a clue):
MacBook Pro 14-inch, 2021
macOS 14.3.1 (Sonoma)
Apple M1 Pro, 16 GB RAM
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
RStudio Version 2023.12.1+402 (2023.12.1+402)
> packageVersion("data.table")
[1] ‘1.14.10’
Your hardware specs look similar to mine:
MacBook Pro 14-inch, 2021 <- same
macOS 14.4.1 (Sonoma) <- issues started with 14.3.1
Apple M1 Pro, 16 GB RAM <- same
R version 4.3.3 (2024-02-29) <- issues started earlier, not sure, if 4.3.2 or 4.3.1
Platform: aarch64-apple-darwin20 (64-bit) <- same
RStudio Version 2023.12.1+402 (2023.12.1+402) <- same
> packageVersion("data.table")
[1] ‘1.15.2’ <- issues started earlier with 1.15.0
@lnnrtwttkhn: Have you tried forcing the error in the original R.app
? I haven't tried that yet. Using Rscript
in the terminal seems to work fine.
Compiler used to compile data.table may also carry an impact here. Afair You can access compiler info from cc file located in DT installation directory
There might be something to it. I vaguely remember it started with Xcode 15.2. @lnnrtwttkhn, do you also compile from source or do you use the precompiled binary? @jangorecki, do you happen to know the command to get this info?
https://github.com/Rdatatable/data.table/blob/566bff0fe1a10d94a494026c59eb611b90b4dc04/configure#L7
As you can see it is 'cc' file. During installation of a pkg 'inst' dir content is brought to pkg dir. So locate your package path and open 'cc' file which is there.
If I am not mistaken the package path should be /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/DT
. However, I don't have a file named cc
in there:
% ls -aR /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/DT | grep 'cc'
accent-neutralise
/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/DT/htmlwidgets/lib/datatables-plugins/filtering/accent-neutralise:
Try data.table rather than DT
Ah yes, there's the file. But it doesn't look that informative:
CC=clang -arch arm64
CFLAGS=-falign-functions=64 -Wall -g -O2
Ok, I posted this before, but I deleted it, since I wanted to test myself, if it still works:
R still uses Xcode 14.2/14.3. I'm gonna try to use an older Xcode and see, if this actually makes a difference. If you also wann try this @lnnrtwttkhn, here is a little manual how to use an Xcode version that's not officially supported by Sonoma anymore:
sudo xcode-select -s /Applications/Xcode_14.3.1.app/Contents/Developer
to set your command-line Xcode to 14.3.1With xcode-select -p
you can check your selected Xcode, with sudo xcode-select -r
you can reset it to the default version (should be /Applications/Xcode.app/Contents/Developer
).
If you are using the openmp lib from here, I guess you should also revert this back to 15.0.7.
I installed data.table
from source after doing this. Let's wait and see, if the error still happens.
Compiler and compiler flags, so info from cc file, although may not look very informative, yet it might be essential for tracking down source of the problem.
Yes, I was just hoping, I'd give you the version of the compiler too.
For debugging you could tweak configure script to do it as well. Afair we were reporting version of gcc
I'm not so familiar with compiling in general. I know some basics about compiling pre-written stuff, but I don't know much about any of the C languages. What could I do to tweak it? I guess you are referring to this lines of code:
case $CC in gcc*)
GCCV=`${CC} -dumpfullversion -dumpversion`
echo "$CC $GCCV"
esac
and I guess, this does not do anything since my $CC is clang
and therefor not in gcc*
. Also, would this write to a file or just output it in the terminal during install?
Besides that, I haven't seen the error after recompiling with Xcode 14.3.1 and libomp from LLVM 15.0.7, but I also haven't used it as heavily since that. However, if this would be the reason, it would be really great to figure out, how to overcome this with Xcode 15.3. Otherwise, I would need to keep the around 20GB old Xcode executable just for data.table
. I Haven't figured out a way yet to only use the old command-line tools stand alone.
You can put into the script something like, not even inside if
CLANGV=`${CC} -dumpfullversion -dumpversion`
echo "$CC $CLANGV"
Just change -dumpfullversion -dumpversion
to clang way of printing version. Then compiler version should be logged into install.out file, and visible on screen when installing
So, even with the Xcode 14.3.1 compiled version the fatal error still happens. Maybe I'll first test the precompiled version for a while and see how this turn out. If the error also happens there then I don't think the compiler is the issue.
Still I'll add you code to the configure
file. How would I do this best? clone the cran repo (https://github.com/cran/data.table
), edit it there and then install with devtools from local source? (or I fork it and install from github)
Use R to install from source. If you use third party tools you are adding a surface for new problems to pop in.
I'll do, but I won't be using it heavily until next week, so I'll report back how the precompiled binary behaves somewhere next week.
btw. could it also be a faulty libomp.dylib
or anything else from the omp
stuff? Is the library also loaded if I run data.table
code? Or is it just used during compilation and then not touched at all anymore?
I installed from source using Xcode 14.3.1 and libomp 15.0.7, but after install I switched back to Xcode 15.3 and libomp 16.0.4. If the libomp 16.0.4 would be the issue, would this also be a problem, if I switch to it after building data.table
?
Edit: And maybe in the end it still is an RStudio issue. I've just seen that the latest release was 2024-01-29, which also fits quite good to the first appearance of the issue.
One thing I just stumbled across is that the file inst/cc
was changed from 1.14.8 to 1.14.10 from
CC=gcc
CFLAGS=-O2
to
CC=clang -arch arm64
CFLAGS=-falign-functions=64 -Wall -g -O2
not sure, if this might be any hint. I'm just poking around in the dark.
Ok, here is an MRE that raised the error three times, while coding interactively and running partial with cmd + alt + b
and running it completely with cmd + alt + r
.
While the error occurred I had data.table
version 1.15.4,
installed from source using Xcode 15.3
with OpenMP from LLVM 16.0.4 from here
(installed as stated there: sudo tar fvxz openmp-16.0.4-darwin20-Release.tar.gz -C /
).
This is my ~/.R/Makevars
:
CPPFLAGS += -Xclang -fopenmp
LDFLAGS += -lomp
I installed data.table
with this command:
install.packages("data.table", type = "source")
R: 4.3.3 Rstudio: 2023.12.1 Ah, and I am using this theme, maybe that could also be a source of the error? <- the error also happens with default themes activated, so this is unlikely
Here's the MRE:
library(data.table)
input <- "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
flights <- fread(input)
saveRDS(flights, "flights.rds")
flights <- readRDS("flights.rds")
flights[year == 2014, year := 14]
flights[year == 14, year := 2014] # happened here once
flights[, carrier_origin_dest := paste0(carrier, "_", origin, "_", dest)] # happened here once
flights[, c("carrier_split", "origin_split", "dest_split") := tstrsplit(carrier_origin_dest,
"_",
fixed = TRUE)]
saveRDS(flights, "flights_modified.rds")
fwrite(flights, "flight_modified.csv") # happened here once
You might need some patients, since the example doesn't always raise the error, but I've seen it do it at least 4 times in an hour.
Does the error go away if you run setDT()
after readRDS()
?
Today seems to be a "good" day. I just ran the script 20 times without raising the error once. So, it's hard to perform any test :(.
I checked, which library is used during install and during loading.
For this I temporarily removed the /usr/local/lib/libomp.dylib and the bundled omp files at /usr/local/include (omp-tools.h, omp.h, ompt.h) or just /usr/local/lib/libomp.dylib.
If I remove all four, I can normally load (or attach) data.table using library(data.table)
. It is still loaded with multiple threads:
data.table 1.15.4 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
It does however make it impossible to install data.table from source with these flags (set in ~/.R/Makevars; I also temporarily removed the omp files from /usr/local/include that are bundled in the tarball)
CPPFLAGS += -Xclang -fopenmp
LDFLAGS += -lomp
This is the error that happens then:
...
In file included from ./data.table.h:1:
./myomp.h:2:12: fatal error: 'omp.h' file not found
#include <omp.h>
^~~~~~~
1 error generated.
make: *** [assign.o] Error 1
ERROR: compilation failed for package ‘data.table’
* removing ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/data.table’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/data.table’
Warning in install.packages :
installation of package ‘data.table’ had non-zero exit status
If I only remove /usr/local/lib/libomp.dylib, but keep the omp files in place, I can install and attach data.table. So it seems that also here the internal libomp.dylib from R is used. Just the bundled omp files at /usr/local/include (omp-tools.h, omp.h, ompt.h) seem to be used during install. Maybe this is caused by a mismatch of these file and the used libomp.dylib?
P.S.: I couldn't try the setDT()
after readRDS()
yet, since I was on the non-OpenMP version, but I'll try that today.
I'm having the same problem with totally random "R Session Aborted" when using data.table.
Since the last two version
data.table
is randomly causing anR Session Aborted
error within RStudio. I couldn't it narrow down to a specific function, but it only happens withdata.table
loaded and it also happens with onlydata.table
explicitly loaded. Apart from this I don't have any more clues, what could cause it as it happens on various commands (saveRDS()
, simple assignment of adt
to a variable...). Anyone else experiencing this?I am using
data.table 1.15.0
compiled from source on macOS 14.3.1.