Open salvafern opened 2 years ago
@salvafern make sure to use WFS 2.0 version; AFAIK pagination in WFS is only supported in WFS 2.0, I see you used 1.1.0
Try with setting version 2.0.0 like this:
wfs <- WFSClient$
new("https://geo.vliz.be/geoserver/Dataportal/wfs", "2.0.0", logger = "INFO")$
getCapabilities()$
findFeatureTypeByName("Dataportal:eurobis-obisenv_basic")
params <- "where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464"
#with pagination
system.time(feature_pagination <- wfs$getFeatures(viewParams = params, paging = TRUE, paging_length = 1000))
justed tested the pagination and it worked
Indeed now it works, thanks a lot! I was using v1.1.0 to copy what the download toolbox did, but I guess there's no harm in using v2.0.0
I have also tried now using the parellel options:
Probably I'm doing something wrong. I expected that multiple requests would be done for each chunk, but I just ran into an error.
library(ows4R)
library(parallel)
wfs <- WFSClient$
new("https://geo.vliz.be/geoserver/Dataportal/wfs", "2.0.0", logger = "INFO")$
getCapabilities()$
findFeatureTypeByName("Dataportal:eurobis-obisenv_basic")
# Querying dataset: https://www.emodnet-biology.eu/data-catalog?module=dataset&dasid=8020
# ~500K rows
params <- "where%3Adatasetid+IN+%288020%29"
# With pagination and parellelization
cl <- makeCluster(detectCores() - 1)
cl
#> socket cluster with 15 nodes on host ‘localhost’
debug(wfs$getFeatures)
system.time(feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results",
paging = TRUE, paging_length = 10000,
parallel = TRUE, parallel_handler = parallel::mclapply, cl = cl))
#> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> No layers in datasource.
#> Timing stopped at: 0.023 0 11.45
via debug()
I can see that at some point, a request of type 'hits' is read with sf::st_read()
, which of course fails. This happens at https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L328
The response in destfile
looks like
<?xml version="1.0" encoding="UTF-8"?>
<wfs:FeatureCollection
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fes="http://www.opengis.net/fes/2.0"
xmlns:wfs="http://www.opengis.net/wfs/2.0"
xmlns:gml="http://www.opengis.net/gml/3.2"
xmlns:ows="http://www.opengis.net/ows/1.1"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" numberMatched="408603" numberReturned="0" timeStamp="2022-03-31T07:57:57.251Z" xsi:schemaLocation="http://www.opengis.net/wfs/2.0 http://schemas.opengis.net/wfs/2.0/wfs.xsd"/>
I tried comparing no parallelization vs parallelization with mclapply
and parLapply
but I'm not seeing any improvement on the performance. Probably it needs pagination as well?
# No pagination nor parellelization
system.time(feature <- wfs$getFeatures(viewParams = params, resultType="results"))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=2.0.0&typeNames=Dataportal:eurobis-obisenv_basic&viewParams=where%3Adatasetid+IN+%288020%29&resultType=results&request=GetFeature
#> user system elapsed
#> 26.718 2.080 67.476
# Parallelization parLapply
system.time(feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results",
parallel = TRUE, parallel_handler = parallel::parLapply, cl = cl))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=2.0.0&typeNames=Dataportal:eurobis-obisenv_basic&viewParams=where%3Adatasetid+IN+%288020%29&resultType=results&request=GetFeature
#> user system elapsed
#> 27.457 2.477 65.883
# Parallelization mclapply
system.time(feature_parallel2 <- wfs$getFeatures(viewParams = params, resultType="results",
parallel = TRUE, parallel_handler = parallel::mclapply, cl = cl))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=2.0.0&typeNames=Dataportal:eurobis-obisenv_basic&viewParams=where%3Adatasetid+IN+%288020%29&resultType=results&request=GetFeature
#> user system elapsed
#> 26.226 2.274 63.895
Many thanks again for the help! Let me know if I there is anything I can do.
Yes, sounds they are issues with the parallelization, will have a look asap.
If you want to use the cluster approach, you can use this handler : parallel::parLapply
which works with cluster. mclapply can't work apparently because I didn't allow specifying the extra args needed for this handler
I got the same error :(
feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results",
paging = TRUE, paging_length = 10000,
parallel = TRUE, parallel_handler = parallel::parLapply, cl = cl)
#> Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> No layers in datasource.
@salvafern i don't forget this, i started working on it, but still looking into the best way to fix the parallel handlers.
Hi @eblondel ,
I have been giving a try to
ows4r
to query biological occurrence data from EMODnet-BiologyIn this example below, I requested:
I got a WFS request using the EMODnet-Biology download toolbox (at the end of the selection, you can copy the WFS request in "Get webservice url")
Good news are that
viewParams
via vendor params work like a charm! (although I have to watch out for the encoding https://github.com/lifewatch/eurobis/issues/15#issuecomment-1081925137)I am having troubles however with the paging and parallel options. After some debugging, I think the issue might be that
ows4r
is relying on a param namednumberMatched
when usingresultstype = "hits"
at: https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L240And this is not being returned geo.vliz.be (should happen around: https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L291)
Could you have a look and see what is happening?
Thanks a lot!
Created on 2022-03-29 by the reprex package (v2.0.1)
This issue partly follows up #29