Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.59k stars 978 forks source link

R crashes when using .SD #1304

Closed crocovert closed 9 years ago

crocovert commented 9 years ago

I found a very annoying bug, R crashes sometimes when I used .SD and put a filter on subgroups Example of command that may make R crash

dt[,nrow(.SD[filter1="E",]),by=list(year_1,year2)] It seems that R crashes (and closes brutally) when the filtered subgroup is empty (but I'm not sure that's the cause)

The same following command with .I seems to work correctly dt[,length(.I[filter1="E"]),by=list(year_1,year2)]

but if I can use it to replace nrow by length, it does'nt work when I want to sum a column in filtered groups

It's very annoying and blocking

Thnk you for your help

arunsrinivasan commented 9 years ago

Thanks. But we can't do much without a minimal reproducible example. Feel free to open when you manage to provide one.

DavidArenburg commented 9 years ago

As a side note, you do realise that filter1="E" doesn't do filtering rather assigment, right?

crocovert commented 9 years ago

Hello,

Here is a test data.table and a command in the .r file that make R crash

I hope this will help you

Thanks

Patrick PALMIER Responsable du groupe Systèmes de Transport et Sécurité tél : +33 (0)3 20 49 60 70 mob:+33 (0)7 77 34 25 07

Centre d’études et d’expertise sur les risques, l’environnement, la mobilité et l’aménagement - www.cerema.fr Direction territoriale Nord-Picardie - 2, rue de Bruxelles CS 20275 - 59019 Lille Cedex Siège social : Cité des Mobilités - 25 avenue François Mitterrand - CS 92 803 - 69674 Bron Cedex - tél : +33 (0)4 72 14 30 30

Le 02/09/2015 10:14, "> Arun (par Internet, dépôt noreply@github.com)" a écrit :

Thanks. But we can't do much without a minimal reproducible example. Feel free to open when you manage to provide one.

— Reply to this email directly or view it on GitHub https://github.com/Rdatatable/data.table/issues/1304#issuecomment-136972224.

library(data.table) load("test.Rdata")

test[,nrow(.SD[filtre1=="E",]),by=list(AnDelivrance,MoisDelivrance)]

crocovert commented 9 years ago

In fact I tested it with filter1=="E".

I just sent you an example of script and data to test it

Patrick PALMIER Responsable du groupe Systèmes de Transport et Sécurité tél : +33 (0)3 20 49 60 70 mob:+33 (0)7 77 34 25 07

Centre d’études et d’expertise sur les risques, l’environnement, la mobilité et l’aménagement - www.cerema.fr Direction territoriale Nord-Picardie - 2, rue de Bruxelles CS 20275 - 59019 Lille Cedex Siège social : Cité des Mobilités - 25 avenue François Mitterrand - CS 92 803 - 69674 Bron Cedex - tél : +33 (0)4 72 14 30 30

Le 02/09/2015 10:30, "> David Arenburg (par Internet, dépôt bounces+848413-5eca-patrick.palmier=cerema.fr@sgmail.github.com)" a écrit :

As a side note, you do realise that |filter1="E"| doesn't do filtering rather assigment, right?

— Reply to this email directly or view it on GitHub https://github.com/Rdatatable/data.table/issues/1304#issuecomment-136975521.

DavidArenburg commented 9 years ago

Is it just me or you are making an assumption here that we have access to your hard drive?

jangorecki commented 9 years ago

@DavidArenburg me to. @crocovert be sure to read support page: https://github.com/Rdatatable/data.table/wiki/Support

crocovert commented 9 years ago

In the script I just mentionned the file name Put the test.Rdata file in a directory () and replace load("test.Rdata") by load("/test.Rdata>") I should work and load the data.table

Patrick PALMIER Responsable du groupe Systèmes de Transport et Sécurité tél : +33 (0)3 20 49 60 70 mob:+33 (0)7 77 34 25 07

Centre d’études et d’expertise sur les risques, l’environnement, la mobilité et l’aménagement - www.cerema.fr Direction territoriale Nord-Picardie - 2, rue de Bruxelles CS 20275 - 59019 Lille Cedex Siège social : Cité des Mobilités - 25 avenue François Mitterrand - CS 92 803 - 69674 Bron Cedex - tél : +33 (0)4 72 14 30 30

Le 02/09/2015 10:14, "> Arun (par Internet, dépôt noreply@github.com)" a écrit :

Thanks. But we can't do much without a minimal reproducible example. Feel free to open when you manage to provide one.

— Reply to this email directly or view it on GitHub https://github.com/Rdatatable/data.table/issues/1304#issuecomment-136972224.

Tensibai commented 9 years ago

@crocovert Keep in mind GitHub issues don't allow attachments other than images. It has the great ability to mutate your mail answer to a message in the github issue but won't attach anything else than images. Use a dput(head(test)) to give a sample, or put your data file somewhere on the itnernet and give the link to it. (If you want more details see Producing a minimal dataset/COpy your data)

crocovert commented 9 years ago

Here is the test dataset in dput format (It's important because, if I do a write.table and a read.delim to export and reimport the dataset, it works)

then Save it in a file and load it in R with dget As it is a data.frame, transform it into a data.table with data.table(...)

apply the following command (test=name of the data.table)

test[,nrow(.SD[filtre1=="E",]),by=list(AnDelivrance,MoisDelivrance)]

structure(list(filtre1 = c("E", "E", "E", "E", "E", "E", "HE", "E", "E", "E", "E", "E", "HE", "E", "E", "E", "E", "E", "E", "E", "E", "E", "E", "E", "HE", "E"), filtre2 = c("E_Autre", "E_Autre", "E_Autre", "E_Carte", "E_Carte", "E_Carte", "HE_Avis", "E_Autre", "E_Carte", "E_Carte", "E_Carte", "E_Carte", "HE_Avis", "E_Autre", "E_Autre", "E_Autre", "E_Carte", "E_Autre", "E_Carte", "E_Autre", "E_Autre", "E_Autre", "E_Autre", "E_Autre", "HE_Itineraire", "E_Autre"), AnDelivrance = c("2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", NA), MoisDelivrance = c("08", "07", "07", "07", "07", "07", "08", "07", "07", "07", "07", "07", "08", "08", "07", "07", "07", "08", "07", "07", "07", "07", "07", "10", "08", NA)), .Names = c("filtre1", "filtre2", "AnDelivrance", "MoisDelivrance"), row.names = c(NA, 26L), class = "data.frame")

DavidArenburg commented 9 years ago
save(test, file = "test.RData")
load("test.RData")
setDT(test)[, sum(filtre1 == "E"), by = .(AnDelivrance, MoisDelivrance)]
#    AnDelivrance MoisDelivrance V1
# 1:         2014             08  3
# 2:         2014             07 18
# 3:         2014             10  1
# 4:           NA             NA  1

Works for me

jangorecki commented 9 years ago

Works for me also. I don't think is effective to debug R crashes without even session info. It looks like OP was not interested in reading linked Support page. Session info is mentioned there in Top 3 common mistakes.

crocovert commented 9 years ago

If works for me too

but not with "nrow(.SD" instead of sum directly

save(test,file = "test.RData") load("test.RData") setDT(test)[, sum(filtre1 == "E"),by = .(AnDelivrance,MoisDelivrance)]

AnDelivrance MoisDelivrance V1

1: 2014 08 3

2: 2014 07 18

3: 2014 10 1

4: NA NA 1

Works for me

— Reply to this email directly or view it on GitHub https://github.com/Rdatatable/data.table/issues/1304#issuecomment-137027512.

DavidArenburg commented 9 years ago

Did you try it with the devel version (v 1.9.5)?

eantonya commented 9 years ago

Works for me. As another aside - use sum(filtre1 == "E") instead of that .SD expression.

On Wed, Sep 2, 2015 at 7:38 AM, David Arenburg notifications@github.com wrote:

Did you try it with the devel version (v 1.9.5)?

— Reply to this email directly or view it on GitHub https://github.com/Rdatatable/data.table/issues/1304#issuecomment-137060976 .