Closed mmcfarland closed 7 years ago
GeoJSON driver docs here http://www.gdal.org/drv_geojson.html
Ran this script to time hitting the rwd-nhd endpoint for the existing code:
10.times { |i| puts "Run #{i}"; system "time curl http://localhost:5000/rwd-nhd/29.84303/-89.99245 &>/dev/null" }
...which returns this response from the API:
{
"input_pt": {
"crs": {
"properties": {
"name": "urn:ogc:def:crs:OGC:1.3:CRS84"
},
"type": "name"
},
"features": [
{
"geometry": {
"coordinates": [
-89.99245,
29.84303
],
"type": "Point"
},
"properties": {
"DistStr_m": 13205.220703125,
"Dist_moved": -1,
"ID": 1,
"Lat": 29.84303,
"Lon": -89.99245
},
"type": "Feature"
}
],
"type": "FeatureCollection"
},
"watershed": {
"crs": {
"properties": {
"name": "urn:ogc:def:crs:OGC:1.3:CRS84"
},
"type": "name"
},
"features": [
{
"geometry": {
"coordinates": [
[
[
-89.99248653457605,
29.8430871743568
],
[
-89.99217648668306,
29.843070089747
],
[
-89.9921961134239,
29.842800194342086
],
[
-89.9925061603094,
29.842817278906676
],
[
-89.99248653457605,
29.8430871743568
]
]
],
"type": "Polygon"
},
"properties": {
"Area_km2": 0.0009,
"GRIDCODE": 1
},
"type": "Feature"
}
],
"type": "FeatureCollection"
}
}
Results:
Run 0
real 0m3.690s
user 0m0.004s
sys 0m0.004s
Run 1
real 0m4.066s
user 0m0.004s
sys 0m0.005s
Run 2
real 0m4.461s
user 0m0.004s
sys 0m0.004s
Run 3
real 0m4.042s
user 0m0.004s
sys 0m0.004s
Run 4
real 0m4.181s
user 0m0.004s
sys 0m0.004s
Run 5
real 0m4.439s
user 0m0.005s
sys 0m0.005s
Run 6
real 0m4.366s
user 0m0.004s
sys 0m0.004s
Run 7
real 0m3.771s
user 0m0.004s
sys 0m0.004s
Run 8
real 0m4.092s
user 0m0.004s
sys 0m0.004s
Run 9
real 0m3.787s
user 0m0.004s
sys 0m0.005s
Going to update the RWD code to use the GeoJSON driver and time it again.
Keep encountering the following error when using the GeoJSON driver in main.py
:
{
"error": "'NoneType' object has no attribute 'GetLayer'",
"stackTrace": "Traceback (most recent call last):\n File \"/usr/src/api/main.py\", line 138, in run_rwd_nhd\n create_simplify_tolerance_by_area(wshed_shp_path)))\n File \"/usr/src/api/main.py\", line 196, in create_simplify_tolerance_by_area\n area = get_shp_area(shp_file_path)\n File \"/usr/src/api/main.py\", line 187, in get_shp_area\n layer = dataSource.GetLayer()\nAttributeError: 'NoneType' object has no attribute 'GetLayer'\n"
}
We need to know the area of the geometry to calculate the simplify tolerance. We'll need to do this at the time that we serialize the file to disk now that there is no intermediary shapefile.
It may be simpler to prototype this using DRB first, since we don't need to do this step. If we determine there is a performance improvement, we can refactor how we calculate the simplification tolerance for NHD.
Sounds good. FWIW it looks like it will work if I just use the hard coded value from the DRB endpoint rather than passing it through the create_simplify_tolerance...
method.
Put up code with some changes to see as #67, then used the script below to run it 50 times and time the results.
TLDR: averaging 50 queries each to http://localhost:5000/rwd-nhd/39.892986/-75.276639
on the test branch and develop
the results were:
Average time: 3.9133798800000017
develop
average time: Average time: 4.4474728799999985
Going to try it again with a different point.
#!/usr/bin/env ruby
require 'net/http'
require 'uri'
branch_name="develop"
number_of_tries = 50
coordinates = '39.892986/-75.276639'
base_uri = URI.parse('http://localhost:5000/rwd-nhd/' + coordinates)
result_value = 0
puts "Testing #{branch_name} branch for #{coordinates}"
puts "-------"
number_of_tries.times do |i|
start_time = Time.now
response = Net::HTTP.get(base_uri)
value = Time.now - start_time
puts "Run #{i} => #{value}s"
result_value += value
end
puts "-------"
puts "Average time: #{result_value/number_of_tries.to_f}"
For requests to http://localhost:5000/rwd-nhd/39.892986/-75.276639
the output from saving the watershed as GeoJSON initially rather than saving as a shapefile and converting it is:
Testing ki/time-ogr-geojson branch for 39.892986/-75.276639
-------
Run 0 => 3.851028s
Run 1 => 3.848552s
Run 2 => 4.025903s
Run 3 => 4.004505s
Run 4 => 4.024866s
Run 5 => 3.936748s
Run 6 => 3.642943s
Run 7 => 3.789247s
Run 8 => 3.893356s
Run 9 => 3.859809s
Run 10 => 3.753403s
Run 11 => 3.849026s
Run 12 => 3.787636s
Run 13 => 4.023785s
Run 14 => 3.978589s
Run 15 => 4.079145s
Run 16 => 3.985154s
Run 17 => 4.095986s
Run 18 => 3.955546s
Run 19 => 4.056563s
Run 20 => 3.84861s
Run 21 => 3.739372s
Run 22 => 3.751693s
Run 23 => 3.746862s
Run 24 => 3.85868s
Run 25 => 3.870811s
Run 26 => 4.016358s
Run 27 => 3.974757s
Run 28 => 3.986042s
Run 29 => 4.042353s
Run 30 => 3.789456s
Run 31 => 3.756915s
Run 32 => 3.891902s
Run 33 => 3.734787s
Run 34 => 3.813058s
Run 35 => 3.909185s
Run 36 => 3.978702s
Run 37 => 3.948594s
Run 38 => 3.706634s
Run 39 => 3.839278s
Run 40 => 3.811009s
Run 41 => 3.922629s
Run 42 => 5.11482s
Run 43 => 4.010178s
Run 44 => 4.045377s
Run 45 => 4.032211s
Run 46 => 3.727259s
Run 47 => 3.714537s
Run 48 => 3.802279s
Run 49 => 3.842856s
-------
Average time: 3.9133798800000017
For comparison, requests to develop
looks like this:
Testing develop branch for 39.892986/-75.276639
-------
Run 0 => 5.966836s
Run 1 => 4.415812s
Run 2 => 4.587254s
Run 3 => 4.224133s
Run 4 => 4.517195s
Run 5 => 4.621452s
Run 6 => 5.76318s
Run 7 => 4.411121s
Run 8 => 4.791725s
Run 9 => 5.353986s
Run 10 => 4.89055s
Run 11 => 4.512194s
Run 12 => 4.340416s
Run 13 => 4.193591s
Run 14 => 4.104692s
Run 15 => 4.243441s
Run 16 => 4.097622s
Run 17 => 4.429821s
Run 18 => 4.336733s
Run 19 => 4.44539s
Run 20 => 4.262046s
Run 21 => 4.380746s
Run 22 => 4.600536s
Run 23 => 4.760481s
Run 24 => 4.387992s
Run 25 => 4.129498s
Run 26 => 4.488236s
Run 27 => 5.312092s
Run 28 => 4.971472s
Run 29 => 4.763374s
Run 30 => 4.856929s
Run 31 => 4.281255s
Run 32 => 4.27161s
Run 33 => 3.922371s
Run 34 => 5.17833s
Run 35 => 4.0756s
Run 36 => 3.936476s
Run 37 => 4.527369s
Run 38 => 5.932469s
Run 39 => 4.393169s
Run 40 => 4.355641s
Run 41 => 4.36323s
Run 42 => 3.64724s
Run 43 => 3.839577s
Run 44 => 3.667376s
Run 45 => 3.774305s
Run 46 => 3.874217s
Run 47 => 3.822723s
Run 48 => 3.630169s
Run 49 => 3.719971s
-------
Average time: 4.4474728799999985
For requests to http://localhost:5000/rwd-nhd/29.84303/-89.99245
:
Summary
Average time: 4.239757620000001
develop
average time: Average time: 4.4004549599999985
Full output
testing branch script output:
Testing ki/time-ogr-geojson branch for 29.84303/-89.99245
-------
Run 0 => 3.81758s
Run 1 => 6.249959s
Run 2 => 4.544147s
Run 3 => 4.417469s
Run 4 => 4.62977s
Run 5 => 4.427504s
Run 6 => 4.30291s
Run 7 => 4.187541s
Run 8 => 5.099523s
Run 9 => 4.056292s
Run 10 => 4.01233s
Run 11 => 4.364373s
Run 12 => 4.171518s
Run 13 => 3.948134s
Run 14 => 4.7016s
Run 15 => 4.455344s
Run 16 => 5.021939s
Run 17 => 4.467673s
Run 18 => 4.0756s
Run 19 => 4.156306s
Run 20 => 4.263819s
Run 21 => 4.420543s
Run 22 => 4.837179s
Run 23 => 4.959897s
Run 24 => 4.350679s
Run 25 => 4.019037s
Run 26 => 4.0482s
Run 27 => 3.94876s
Run 28 => 4.126326s
Run 29 => 3.990464s
Run 30 => 4.070863s
Run 31 => 3.981769s
Run 32 => 3.783816s
Run 33 => 3.712914s
Run 34 => 4.460341s
Run 35 => 4.096151s
Run 36 => 4.04326s
Run 37 => 4.061956s
Run 38 => 4.067508s
Run 39 => 3.750638s
Run 40 => 3.968356s
Run 41 => 4.017581s
Run 42 => 3.901962s
Run 43 => 4.298341s
Run 44 => 3.83998s
Run 45 => 4.068314s
Run 46 => 3.922277s
Run 47 => 3.942286s
Run 48 => 3.96736s
Run 49 => 3.959792s
-------
Average time: 4.239757620000001
develop
script output:
Testing develop branch for 29.84303/-89.99245
-------
Run 0 => 3.932337s
Run 1 => 3.915401s
Run 2 => 3.906243s
Run 3 => 4.084814s
Run 4 => 4.076013s
Run 5 => 4.098818s
Run 6 => 4.172563s
Run 7 => 4.334876s
Run 8 => 4.199611s
Run 9 => 5.845253s
Run 10 => 5.490111s
Run 11 => 4.922248s
Run 12 => 4.586968s
Run 13 => 3.824627s
Run 14 => 4.117349s
Run 15 => 4.111065s
Run 16 => 5.283659s
Run 17 => 4.942004s
Run 18 => 4.256765s
Run 19 => 4.081843s
Run 20 => 4.587665s
Run 21 => 3.987953s
Run 22 => 4.851793s
Run 23 => 5.61902s
Run 24 => 4.329889s
Run 25 => 4.564126s
Run 26 => 4.817475s
Run 27 => 4.588089s
Run 28 => 4.117892s
Run 29 => 4.495614s
Run 30 => 3.904015s
Run 31 => 3.928218s
Run 32 => 5.044657s
Run 33 => 4.16405s
Run 34 => 4.154067s
Run 35 => 4.030647s
Run 36 => 3.934775s
Run 37 => 4.094194s
Run 38 => 4.017263s
Run 39 => 4.753695s
Run 40 => 5.050242s
Run 41 => 4.317274s
Run 42 => 4.242062s
Run 43 => 4.336468s
Run 44 => 3.990681s
Run 45 => 4.756981s
Run 46 => 4.351649s
Run 47 => 4.440799s
Run 48 => 3.956156s
Run 49 => 4.412771s
-------
Average time: 4.4004549599999985
http://localhost:5000/rwd-nhd/39.892986/-75.276639 is a larger and more complicated shape than http://localhost:5000/rwd-nhd/29.84303/-89.99245 btw. The test branch currently hard codes a reduction value, too, rather than generating as a function of the shape area.
Here's develop
for 29.851140886822083/-89.97982978820801
:
Average time was: Average time: 35.62731504
Testing develop branch for 29.851140886822083/-89.97982978820801
-------
Run 0 => 26.037074s
Run 1 => 35.160112s
Run 2 => 43.496722s
Run 3 => 41.128372s
Run 4 => 30.368514s
Run 5 => 27.528782s
Run 6 => 27.867747s
Run 7 => 28.027994s
Run 8 => 27.239265s
Run 9 => 31.600715s
Run 10 => 33.613994s
Run 11 => 39.213031s
Run 12 => 44.622256s
Run 13 => 35.61663s
Run 14 => 35.113504s
Run 15 => 33.335992s
Run 16 => 35.204312s
Run 17 => 35.568593s
Run 18 => 33.35408s
Run 19 => 33.12528s
Run 20 => 31.054627s
Run 21 => 30.509346s
Run 22 => 33.793563s
Run 23 => 32.829872s
Run 24 => 28.885288s
Run 25 => 32.113235s
Run 26 => 33.488414s
Run 27 => 29.914761s
Run 28 => 30.141223s
Run 29 => 31.455483s
Run 30 => 32.211453s
Run 31 => 29.678367s
Run 32 => 40.77607s
Run 33 => 44.045323s
Run 34 => 38.11008s
Run 35 => 45.164916s
Run 36 => 39.36456s
Run 37 => 38.989595s
Run 38 => 42.375076s
Run 39 => 33.480214s
Run 40 => 42.934114s
Run 41 => 37.176596s
Run 42 => 39.55338s
Run 43 => 39.775125s
Run 44 => 42.470737s
Run 45 => 39.834382s
Run 46 => 40.113491s
Run 47 => 39.402242s
Run 48 => 41.814354s
Run 49 => 42.686896s
-------
Average time: 35.62731504
Running the test branch now.
The test branch seems routinely to time out without returning results. I got it to work once with a time of 46 seconds, but otherwise it comes back with an empty server response right after 60 seconds.
Sounds like the suspicion is confirmed, it doesn't significantly speed things up. I don't think we should invest further in this approach, but I'll take a quick look at the implementation to see if anything pops out.
Playing around with these changes confirm the reported results, that it actually increases total time (beyond the current gunicorn timeout, in fact).
For reference, some metrics I generated on develop were that for the very large watersheds (near Gulf of Mexico) takes about 33s of which the RWD portion takes 11s, meaning the simplification + serialization take about 22s.
For a modest watershed, the total time is about 1.84s of which RWD is responsible for 1.80s, meaning the rest is only about 0.04s. Obviously there is a relationship between the complexity of the polygon and amount of time it takes to serialize and generalize.
Last bit of information, when separating serialization from simplification on a very large watershed:
21.20s simplify
0.04s serialize
From a total runtime of 33s.
Unless there's a way to speed up the ogr2ogr -simplify
implementation, there doesn't seem to be much room for improvement here.
RWD writes the final watershed vector with OGR shapefile driver which we then convert to GeoJSON with the GeoJSON driver. Determine the impact of removing the intermediate step on performance.