Closed jeroen closed 2 years ago
Thanks, I'll try to take a look, should be possible to cover that case.
On the other hand ... I think that case is known and is simply a 15-year old design want meaning you should
Rcpp::List
or individual column til you are doneRcpp::DataFrame::create
This (actually complete) example works:
#include <Rcpp/Lightest>
// [[Rcpp::export]]
Rcpp::List listtest (int n) {
Rcpp::IntegerVector foo(n);
Rcpp::CharacterVector bar(n);
Rcpp::CharacterVector baz(n);
Rcpp::List ll = Rcpp::List::create(Rcpp::Named("foo") = foo,
Rcpp::Named("bar") = bar);
ll.push_back(baz);
return ll;
}
/*** R
listtest(0)
*/
> Rcpp::sourceCpp("/tmp/issue1232.cpp")
> listtest(0)
$foo
integer(0)
$bar
character(0)
[[3]]
character(0)
>
There is a convenience converter (that IIRC actually calls back into R) that can turn that object into a df too:
// [[Rcpp::export]]
Rcpp::DataFrame dftest (int n) {
Rcpp::IntegerVector foo(n);
Rcpp::CharacterVector bar(n);
Rcpp::CharacterVector baz(n);
Rcpp::List ll = Rcpp::List::create(Rcpp::Named("foo") = foo,
Rcpp::Named("bar") = bar);
ll.push_back(baz);
return Rcpp::DataFrame(ll);
}
> Rcpp::sourceCpp("/tmp/issue1232.cpp")
> #listtest(0)
> dftest(0)
[1] foo bar character.0.
<0 rows> (or 0-length row.names)
>
So maybe the matter is 'merely' the missing name?
Can you live with this 'list til I return' variant?
Can you live with this 'list til I return' variant?
Yes I can do that, but what is the best way to pass stringsAsFactors = false
for this case? Previously I had this:
But then if we first store a list, the _["stringsAsFactors"] = false
part needs to move to Rcpp::DataFrame(ll)
. Should I use Rcpp::DataFrame::create(ll, _["stringsAsFactors"] = false)
?
Not having tested this, the second should work.
That echos what I wrote earlier about collecting your dimensions and calling the instantiation at the last step. I am aware of a few packages doing just that, and setting stringsAsFactors
that way.
I am not fully convinced we want to or need to support the use case you have here but the following diff achieves it:
modified inst/include/Rcpp/DataFrame.h
@@ -140,10 +140,12 @@ namespace Rcpp{
max_rows = Rf_xlength(*it);
}
}
- for (it = Parent::begin(); it != Parent::end(); ++it) {
- if (Rf_xlength(*it) == 0 || ( Rf_xlength(*it) > 1 && max_rows % Rf_xlength(*it) != 0 )) {
- // We have a column that is not an integer fraction of the largest
- invalid_column_size = true;
+ if (max_rows > 0) {
+ for (it = Parent::begin(); it != Parent::end(); ++it) {
+ if (Rf_xlength(*it) == 0 || ( Rf_xlength(*it) > 1 && max_rows % Rf_xlength(*it) != 0 )) {
+ // We have a column that is not an integer fraction of the largest
+ invalid_column_size = true;
+ }
}
}
if (invalid_column_size) {
Seems like a reasonable change to me to resemble what happens in R, and therefore to apply the principle of least surprise:
> foo <- integer(0)
> bar <- character(0)
> baz <- character(0)
> df <- data.frame(foo, bar)
> cbind(df, baz)
[1] foo bar baz
<0 rows> (or 0-length row.names)
I agree. I am inclined to test it. And when you run the example by @jeroen under the diff above you even get a reasonable filled-in name (as push_back
does not set one):
edd@rob:~/git/rcpp(master)$ Rscript -e 'Rcpp::sourceCpp("/tmp/issue1232.cpp")'
> origissue(0)
[1] foo bar character.0.
<0 rows> (or 0-length row.names)
edd@rob:~/git/rcpp(master)$
As expected, no issues in a reverse-dependency check. I just pushed the branch. I will a new unit test for good measure.
Downstream bug report: https://github.com/ropensci/pdftools/issues/108 I think the bug is that DataFrame.push_back() gives an error when trying to add a column to a data frame with 0 rows: