Closed AmyMikhail closed 6 years ago
Couple of queries/observations:
as.Date
when lubridate
has already been loaded and used?j
sections operations could have been grouped and executed in one go as by
argument is same as (by = eval(groupvector))
.have you reproduced this on the command line?
On Wed, Mar 14, 2018, 4:01 AM mrmanojrai notifications@github.com wrote:
Couple of queries/observations:
- Do you get crash with sample data provided with this issue?
- Any specific reason to use as.Date when lubridate has already been loaded and used?
- Most of the j sections operations could have been grouped and executed in one go as by argument is same as (by = eval(groupvector)).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/2672#issuecomment-372799300, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQQdf2-fICsJM4TqLsejZbO8n9a16BVks5teCWugaJpZM4SpSht .
@mrmanojrai in response to your questions:
the crash didn't occur until I added the for loop and made minor modifications to the code so that it would iterate over weeks and provide cumulative summaries (previously summaries were just by group).
it is possible that some combination of missing values, group and week that is not represented in the toy data set above but is in my real data set is causing the crash?
it is possible that R is running out of memory due to the size of the input data set?
This is not really relevant to the problem at hand (at least I don't think it is), but I needed to get a date from the year week as my ISO year weeks (created with the function which handles year-end dates differently to the equivalent function in lubridate) are stored as numeric and I can't perform mathematical operations directly on them. If there is a way to do this with lubridate I would be happy to change this but I'm not aware of a yw
function in lubridate? Essentially I was looking for a way to iteratively define four week periods based on the maximum week by group for each iteration - 4 weeks.
Could you elaborate on this with an example?
@MichaelChirico not sure what you mean viz. command line - do you mean what happens if I run this in base R rather than RStudio? I will try this and let you know the outcome.
Update: base R crashes with the same error details:
Problem signature:
Problem Event Name: APPCRASH
Application Name: Rgui.exe
Application Version: 3.42.7832.0
Application Timestamp: 59ccc2b1
Fault Module Name: datatable.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 5a39aedc
Exception Code: c0000005
Exception Offset: 0000000000029060
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 2057
Additional Information 1: cb11
Additional Information 2: cb11abf51219d08bb34e1d4ff9f1a95b
Additional Information 3: ff0f
Additional Information 4: ff0f0ac722dc1c648282a37d7681e735
Update 2: Rterm.exe
(which curiously is what opened when I clicked on R.exe
) also crashes:
Problem signature:
Problem Event Name: APPCRASH
Application Name: Rterm.exe
Application Version: 3.42.7832.0
Application Timestamp: 59ccc2b3
Fault Module Name: datatable.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 5a39aedc
Exception Code: c0000005
Exception Offset: 0000000000029060
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 2057
Additional Information 1: 56c2
Additional Information 2: 56c2cbd44ff199678d57bf6e48a1f624
Additional Information 3: 7c2f
Additional Information 4: 7c2f3e180b4d57616b407bae15c5c322
Update 3:
On further investigation, dumping the summarized files to .csv I was able to determine that the problem was not in summarizing the data, but rather with the call to rbindlist
- where rbindlist in the CRAN released version of data.table cannot handle empty tables and crashes R as described in issue #2340 .
The issue was fixed in data.table 1.10.5 (development version) and I'm happy to report that after upgrading to 1.10.5 my real function runs on my real data and produces the desired output without crashing R.
Although I was unable to reproduce the problem with a MWE, I think that is just because the MWE didn't sufficiently reflect the complexity of my real data set. I will close the issue.
I am attempting to summarize some variables cumulatively by group for each week in which there was new activity in that group, with a data.table line listing as input. This process works fine with a toy version of the function and a small input data set; however with larger data sets and the real (longer) function, R crashes with a fatal error. The details of the crash are here:
Session info is here:
Here is some example data:
Here is a toy version of the function:
And dependent function isoyrwk:
This is how I am applying the sumclusters function to my data:
mytest <- sapply(idlist, sumclusters, data = mydt, simplify = FALSE, USE.NAMES = TRUE)
Unfortunately I am not able to reproduce the fatal error with the toy data set and toy function. The only difference between the toy function and the real one is that there are more conditional counts on different variables, but the strategy for each one is exactly the same as shown above. I was originally getting a RHS / LHS class discrepancy error but this was discussed and resolved in this Stack Overflow post.
My real input data set is relatively large (2103 rows in the line listing, with reports in 155 weeks and four grouping vectors containing 1271, 144, 108 and 94 groups each, respectively).
I think the error might be due to the function timing out because there is too much data (I have 16GB ram and my .Rproj file is on a network drive) but is there any way to confirm this from the above error? Or is it a bug?
Any insights into why this is causing R to crash and how I could prevent this would be much appreciated - hope I have posted this in the right place as the error details did specify that the fault module name is
datatable.dll