dcooley / sfheaders

Build sf objects from R and Rcpp
https://dcooley.github.io/sfheaders/
Other
74 stars 5 forks source link

Vectorise all functions? #18

Closed mpadge closed 5 years ago

mpadge commented 5 years ago

Awesome stuff @dcooley, and definitely getting there as a reference implementation that i'll be able to incorporate in a lot of my workflows. And now the "however, but ... ": Most of my custom-written implementations (which are accordingly nowhere near as neat or clear as yours) are done because of the huge advantage of vectorisation within C++. And so, I present you with a super-crappy little vectorisation wrapper:

#include <Rcpp.h>
#include "sfheaders/sfg/sfg.hpp"

// [[Rcpp::export]]
Rcpp::List rcpp_junk_sfg( Rcpp::List& x ) {
    Rcpp::List res (x.size ());
    for (auto i = 0; i < x.size (); i++)
    {
        Rcpp::NumericMatrix nm = Rcpp::as <Rcpp::NumericMatrix> (x [i]);
        res (i) = sfheaders::sfg::sfg_linestring (nm);
    }
  return res;
}

And the following benchmarks:

library(sfheaders)
xy <- lapply (mapdeck::roads$geometry, function (i) as.matrix (i))

knitr::kable (rbenchmark::benchmark (
            g <- lapply (xy, function (i) sfg_linestring (i)),
            g <- rcpp_junk_sfg (xy),
            replications = 10) [, 1:4])
test replications elapsed relative
g <- lapply(xy, function(i) sfg_linestring(i)) 10 0.760 10.556
g <- rcpp_junk_sfg(xy) 10 0.072 1.000

Created on 2019-07-17 by the reprex package (v0.3.0)

That is, for me at least, the reason why I build sf in C++ - the individual conversion is faster, but the vectorisation even more so. Thoughts?

dcooley commented 5 years ago

yes this is certainly something which can be implemented.

One question though; I've never had a need for or used a list of sfg objects, do you have a use-case for this?

mpadge commented 5 years ago

That'd be great if that could be implemented! As for the Q: It's how a lot of osmdata stuff gets built, like this linestring example, where all sfg objects get put in a simple Rcpp::List, and then converted to sfc by appending attributes, pretty much like your sfc_attributes function.

I no longer have the code (buried somewhere in commit history), but I did compare list of sfg -> single sfc construction versus list of sfc -> do.call, and the former was more efficient. If I recall correctly, that's why I set off simply building all multi-geom objects as lists of sfg, and only appending the sfc attributes to the resultant list.

dcooley commented 5 years ago

ok cool. I'll have a think about how to implement it - hopefully it can be another overloaded sfc_linestring( Rcpp::List x ).

Also, I see you're using rapidxml; I too use it in another library, and have plans for more, so I decided to wrap it and make it its own r package - https://github.com/dcooley/rapidxmlr.

mpadge commented 5 years ago

Apologies for straying off issue, but apropos C++ wrappers, note further that I've recently wrapped both clipper and concaveman-cpp, both of which offer huge efficiency boosts and are very easy to wrap. (Clipper is here and concaveman is here, but they'll both likely also be spun out into more self-contained units down the line...) Just in case those are of any use for you :wink:

dcooley commented 5 years ago

I've just pushed a commit where each of the sfg_ and sfc_ functions are pluralised, so taking your example, calling rcpp_sfg_linestrings() on a list will turn each element into a list.

g <- sfheaders:::rcpp_sfg_linestrings(xy)
g[[1]]
LINESTRING (145.0143 -37.83046, 145.0143 -37.83057, 145.0145 -37.8307, 145.016 -37.83148, 145.0165 -37.8317, 145.0168 -37.83175, 145.0171 -37.83174, 145.0175 -37.83167, 145.0178 -37.83156, 145.0183 -37.83138, 145.0186 -37.83133, 145.0189 -37.8313, 145.0191 -37.8313, 145.0194 -37.83133, 145.0197 -37.83138, 145.0202 -37.83146, 145.0205 -37.83154, 145.0206 -37.83159, 145.0207 -37.83159, 145.021 -37.83166)

At the moment I haven't added any column controls, ids etc, so it assumes each list element is the correct 'shape'/object.

Thoughts?

mpadge commented 5 years ago

I can't install because of

/usr/bin/ld: to_sfg.o: in function `sfheaders::sfg::sfg_points(Rcpp::Vector<19, Rcpp::PreserveStorage>&)':
to_sfg.cpp:(.text+0xef0): multiple definition of `sfheaders::sfg::sfg_points(Rcpp::Vector<19, Rcpp::PreserveStorage>&)';
junk.o:junk.cpp:(.text+0x17c0): first defined here
/usr/bin/ld: to_sfg.o: in function `sfheaders::sfg::sfg_polygons(Rcpp::Vector<19, Rcpp::PreserveStorage>&)':
to_sfg.cpp:(.text+0x1020): multiple definition of `sfheaders::sfg::sfg_polygons(Rcpp::Vector<19, Rcpp::PreserveStorage>&)';
junk.o:junk.cpp:(.text+0x1d00): first defined here
/usr/bin/ld: to_sfg.o: in function `sfheaders::sfg::sfg_multipolygons(Rcpp::Vector<19, Rcpp::PreserveStorage>&)':
to_sfg.cpp:(.text+0x1150): multiple definition of `sfheaders::sfg::sfg_multipolygons(Rcpp::Vector<19, Rcpp::PreserveStorage>&)';
junk.o:junk.cpp:(.text+0x2ab0): first defined here
:sfg_linestrings(Rcpp::Vector<19, Rcpp::PreserveStorage>&)':
to_sfg.cpp:(.text+0x1800): multiple definition of `sfheaders::sfg::sfg_linestrings(Rcpp::Vector<19, Rcpp::PreserveStorage>&)';
junk.o:junk.cpp:(.text+0x1a60): first defined here
/usr/bin/ld: to_sfg.o: in function `sfheaders::sfg::sfg_multilinestrings(Rcpp::Vector<19, Rcpp::PreserveStorage>&)':
to_sfg.cpp:(.text+0x1930): multiple definition of `sfheaders::sfg::sfg_multilinestrings(Rcpp::Vector<19, Rcpp::PreserveStorage>&)';
junk.o:junk.cpp:(.text+0x1bb0): first defined here
/usr/bin/ld: to_sfg.o: in function `sfheaders::sfg::sfg_multipoints(Rcpp::Vector<19, Rcpp::PreserveStorage>&)':
to_sfg.cpp:(.text+0x2010): multiple definition of `sfheaders::sfg::sfg_multipoints(Rcpp::Vector<19, Rcpp::PreserveStorage>&)';
junk.o:junk.cpp:(.text+0x1910): first defined here
collect2: error: ld returned 1 exit status
dcooley commented 5 years ago

forgot to inline - try now

mpadge commented 5 years ago
devtools::load_all (".", export_all = TRUE)
#> Loading sfheaders
xy <- lapply (mapdeck::roads$geometry, function (i) as.matrix (i))
knitr::kable (rbenchmark::benchmark (
            g <- lapply (xy, function (i) sfg_linestring (i)),
            g <- rcpp_junk_sfg (xy),
            g <- sfheaders:::rcpp_sfg_linestrings(xy),
            replications = 10) [, 1:4])
test replications elapsed relative
g <- lapply(xy, function(i) sfg_linestring(i)) 10 0.787 10.635
g <- rcpp_junk_sfg(xy) 10 0.074 1.000
g <- sfheaders:::rcpp_sfg_linestrings(xy) 10 0.074 1.000

Created on 2019-07-18 by the reprex package (v0.3.0)

»→ Perfect! Thanks loads!