Open harryprince opened 5 years ago
Hi @harryprince , This is an interesting idea and I'm keen to know more about how you think GeoSpark can be used as a backend to mapdeck. Do you have an example of the workflow you have in mind? How would the data get from GeoSpark to mapdeck?
Here is a great example borrows from GeoSpark official document, and I think viz from code is better than Apache Zeppelin, which means more reproducible.
Here is another example to show how GeoSparkViz to render a very large raster image.
@SymbolixAU
The real-time geospatial monitoring now can support by geospark
and sparklyr
,
and wish it could be viz by shiny
and mapdeck
.
here is an example code snippet.
library(future)
library(sparklyr)
library(geospark)
library(dplyr, warn.conflicts = FALSE)
sc <- spark_connect(master = "local", spark_version = "2.2.0")
register_gis(sc)
if(file.exists("source")) unlink("source", TRUE)
if(file.exists("source-out")) unlink("source-out", TRUE)
stream_generate_test(iterations = 1)
read_folder <- stream_read_csv(sc, "source")
process_stream <- read_folder %>%
mutate(a = x*0.02, b = x*0.02) %>%
mutate(y = ST_AsText(st_point(a,b))) %>%
mutate(x = as.double(x)) %>%
ft_binarizer(
input_col = "x",
output_col = "over",
threshold = 400
)
write_output <- stream_write_csv(process_stream, "source-out")
invisible(future(stream_generate_test(interval = 0.2, iterations = 100)))
cat source-out/part-00000-afb2798b-44e2-4a33-ba71-32681da14096-c000.csv
x,a,b,y,over
1.0,0.02,0.02,POINT (0.02 0.02),0.0
2.0,0.04,0.04,POINT (0.04 0.04),0.0
3.0,0.06,0.06,POINT (0.06 0.06),0.0
4.0,0.08,0.08,POINT (0.08 0.08),0.0
5.0,0.10,0.10,POINT (0.1 0.1),0.0
6.0,0.12,0.12,POINT (0.12 0.12),0.0
7.0,0.14,0.14,POINT (0.14 0.14),0.0
8.0,0.16,0.16,POINT (0.16 0.16),0.0
9.0,0.18,0.18,POINT (0.18 0.18),0.0
10.0,0.20,0.20,POINT (0.2 0.2),0.0
11.0,0.22,0.22,POINT (0.22 0.22),0.0
12.0,0.24,0.24,POINT (0.24 0.24),0.0
Here is another Uber trip example:
Would the idea be to take the process_stream
object directly and plot it on a map, rather than writing to disk?
@SymbolixAU You can write stream data into memory instead of the disk:
process_stream %>%
stream_write_memory("urls_stream", mode = "complete")
And it support sparklyr::reactiveSpark()
to do further viz in shiny, reference to rstudio blog release note.
If you are not familiar with sparklyr
, here is a awesome-sparklyr collection: https://github.com/harryprince/awesome-sparklyr
Could you give me a dput()
output of example data, say only 5 lines, generated by
process_stream %>%
stream_write_memory("urls_stream", mode = "complete") %>%
dput()
Or however is best to generate it?
@SymbolixAU example code snippet:
polygons_wkt <- read.table(system.file(package="geospark",sprintf("examples/%s.txt","polygons")), sep="|")
points_wkt <- read.table(system.file(package="geospark",sprintf("examples/%s.txt","points")), sep="|")
stream_generate_test(points_wkt, "source/")
point_stream <- stream_read_csv(sc, "source/",delimiter = ",") %>%
mutate(geom = st_geomfromwkt(geom))
polygon_sdf = points_wkt %>% copy_to(sc,"polygon_sdf")
polygon_sdf %>% sdf_register("polygon_sdf")
point_stream %>% sdf_register("point_stream")
stream <- tbl(sc, sql("
SELECT area, state, count(*) cnt FROM
polygon_sdf
INNER JOIN point_stream
WHERE ST_Contains(polygon_sdf.geom,point_stream.geom) GROUP BY area, state")) %>%
reactiveSpark()
library(shiny)
ui <- fluidPage(DT::dataTableOutput("a"))
server <- function(input, output){
output$a <- renderDataTable({
stream() %>%
as.data.frame() %>%
DT::datatable()
})
}
shinyApp(ui=ui,server=server)
which libraries are these functions from?
stream_generate_test()
stream_read_csv()
st_geomfromwkt()
which libraries are these functions from? ...
@SymbolixAU stream* comes from sparklyr and st* comes from geospark
Sent with GitHawk
Hi mapdeck team: I saw
mapdeck
is a great tool to viz large-scale data, and I found GeoSparkViz is doing a similar thing, which can rasterize a large point dataset into an image file and pass as a base64 string to Apache Zeppelin.Indeed, I think RStudio is a better frontend than Apache Zeppelin for the geospatial data scientist.
Does it possible to integrate GeoSpark as a backend to render large-scale geospatial data in
mapdeck
?Currently, I made a geospark R package already, welcome to discuss more.
References
https://datasystemslab.github.io/GeoSpark/tutorial/viz/