Open sjain777 opened 8 years ago
Hi Arun, Did you get a chance to look at the two issues above? Thanks!
Just looked at this issue:
# long vectors not supported yet: bmerge.c:51
This shouldn't be happening for data of these dimensions, and should be fixed. Thanks for spotting this. I'm not sure if I'll be able to invest time on this for this release though :-(. Will see.
Hi , I am also facing the same issue
dcast(dt, A ~ B, fill = 0, value.var = "col_sum")
Error in dim.data.table(x) : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:138
This works fine if dt is small. But fails when it is high. Can't we classify this as bug rather than enhancement?
I have encountered the same error as @niths4u. I am using developer version and it is not fixed in it either!
This isn't an enhancement, it's a bug. The "enhancement" label is probably what pushes this issue to the back of the queue. I, like many others, have high cardinality data that I need to cast. I use data.table for speed and find that the function is broken.
Please add the "bug" label.
In fread function, if use skip=2500000000, it will also raise the error:
NAs introduced by coercion to integer range
I am also encountering the same error as @niths4u. And agree with @ljodea - Can this be labelled as a bug? Data.table is so appealing for its speed on large data sets, if this is a soft limit on the size of data it can handle it surely makes sense to mark it as something to fix.
The issue is that this can not be fixed with the way dcast is currently implemented. Will need a rewrite. I've added the bug label (I agree that it is technically a bug, although the current error message is much more clearer). I'll have a look at this ASAP but will be lots of work AFAICT.
Does anyone have any updates on when the size limitation for dcast will be fixed?
I have an example posted online that shows an issue with a smaller dataset (smaller than previous posters):
https://www.ecoccs.com/dcast_size-limit.html
The following works as intended (look at the last 2 columns):
library(data.table)
chem_abstracts <- fread("https://www.ecoccs.com/ListInfo-2023-06-30-CA-Index.csv")
chem_abstracts[, c("Internal Tracking Number", "EPA ID #", "TSN #", "Alternate ID", "Synonym Effective Date", "Synonym End Date", "Related Links", "Synonym Comment", "Status") := NULL]
setnames(chem_abstracts, "CAS #", "CAS")
chem_abstractss <- chem_abstracts[1:24100, ]
rsc1 <- dcast(chem_abstractss, ... ~ `Structural Notation Type`,
value.var = "Structural Notation", fill = "")
rsc1
The following does not work as intended (look at the last 2 columns):
library(data.table)
chem_abstracts <- fread("https://www.ecoccs.com/ListInfo-2023-06-30-CA-Index.csv")
chem_abstracts[, c("Internal Tracking Number", "EPA ID #", "TSN #", "Alternate ID", "Synonym Effective Date", "Synonym End Date", "Related Links", "Synonym Comment", "Status") := NULL]
setnames(chem_abstracts, "CAS #", "CAS")
chem_abstractss <- chem_abstracts[1:24100, ]
chem_abstracts <- chem_abstracts[1:24900, ]
rsc2 <- dcast(chem_abstracts, ... ~ `Structural Notation Type`,
value.var = "Structural Notation", fill = "")
rsc2
hi! I don't have plans to work on this myself, but if you have time to work on it and submit a PR, I could review it (I have worked on some other reshape code -- melt -- I'm not an expert on dcast internals but I could at least review).
looks like reaching int32 limit. if output is not a long vector size, but it is just a temporary working variable that exceeds int32 then chunking the input (probably by common dimensions values) should be sufficient workaround for the moment.
Hi, I have data with up to 200K rows and about 12K-20K unique values in value.var which needs to be flattened out using dcast.data.table. I get the following error with such data:
When I reduce the number of rows to 20K, the same syntax as above works fine. Below is code to generate the dummy data:
Also, with the flattened table, the object size increases manifold due to 1 column ("Feature") expanded to several. I would like to reduce the size of the resulting table by filling the flattened columns by BIT instead of LOGICAL. But when I use the following syntax (on a table of 20K rows, ~12K unique values in "Feature"), I get the following error:
Could you please let me know of the fix for both the above problems? Thanks!