azavea / usace-program-analysis-geoprocessing

Companion geoprocessing repository of https://github.com/azavea/usace-program-analysis
Apache License 2.0
1 stars 1 forks source link

Feature/accept chunked requests #5

Closed jtarricone closed 8 years ago

jtarricone commented 8 years ago

Connects #4

This modifies the actor & serializer components so that they process requests which contain multiple multipolygon objects.

It also adds utilization of parallel collections in the raster count endpoint in an effort to reduce the request duration.

To test, clone this branch and cd into the project directory.

Test that it compiles --> ./sbt "project geop" assembly Then bring up the Docker container --> docker build -t usace-geop-test . ; docker run --rm -v ~/.aws:/root/.aws -p 4040:4040 -p 8090:8090 usace-geop-test --driver-memory 2G

A basic test after that is curl http://localhost:8090/ping, which should return a response of OK. More detailed testing is most readily accomplished via using the app/API this was made for.

To that end -->

  1. Spin up ngrok and point it to port 8090 (the geoprocessing service) using ngrok http 8090 or the like.
  2. Clone https://github.com/jtarricone/usace-program-analysis/tree/feature/update-geoproc-request and spin up the VM.
  3. Change docker-compose.yml entries that set the geoprocessor environment variables, so that GEOPROCESSING_HOST points to the ngrok route (e.g. GEOPROCESSING_HOST=61b599e1.ngrok.io) and GEOPROCESSING_PORT is set to 80.
  4. Execute docker-compose up.
  5. Navigate to http://localhost:4567 and use the app. In particular, do a search for programs for some set of parameters, then select an Analysis->Variable option that hits the geoprocessing service, such as "Forested Area" or "Wetlands", and click "Analyze".

As a note to step 5, the first call to the service will be slow because it does some upfront work upon the first call, but subsequent calls appear to incur no such penalty. The result should be either a time out or some interesting data. I noticed a decent reduction in the duration of requests, and as a corollary, requests could send many more AoI's in a request without timing out. I observed that a request of around 100 programs would usually produce a response in time, more than that would usually time out, while requests with 20-30 programs would return somewhere in the 2-6 second range, depending on the buffer size.

rajadain commented 8 years ago

Consider also updating the example file count.json to reflect the new API:

diff --git a/examples/count.json b/examples/count.json
index a31ffee..e2ea452 100644
--- a/examples/count.json
+++ b/examples/count.json
@@ -3,5 +3,5 @@
   "rasters": [
     "nlcd-2011-30m-epsg5070-0.10.0"
   ],
-  "multiPolygon": "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.1626205444336,39.95580659996906],[-75.25531768798828,39.94514735903112],[-75.22785186767578,39.89446035777916],[-75.1461410522461,39.88761144548104],[-75.09309768676758,39.91078961774283],[-75.09464263916016,39.93817189499188],[-75.12039184570312,39.94435771955196],[-75.1626205444336,39.95580659996906]]]]}"
+  "multiPolygons": ["{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.1626205444336,39.95580659996906],[-75.25531768798828,39.94514735903112],[-75.22785186767578,39.89446035777916],[-75.1461410522461,39.88761144548104],[-75.09309768676758,39.91078961774283],[-75.09464263916016,39.93817189499188],[-75.12039184570312,39.94435771955196],[-75.1626205444336,39.95580659996906]]]]}"]
 }
rajadain commented 8 years ago

This looks really good! Just tested with the sample and it seems to work perfectly:

> http --print HhBb :8090/count < examples/count.json
POST /count HTTP/1.1
Accept: application/json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 455
Content-Type: application/json
Host: localhost:8090
User-Agent: HTTPie/0.9.4

{
    "multiPolygons": [
        "{\"type\":\"MultiPolygon\",\"coordinates\":[[[[-75.1626205444336,39.95580659996906],[-75.25531768798828,39.94514735903112],[-75.22785186767578,39.89446035777916],[-75.1461410522461,39.88761144548104],[-75.09309768676758,39.91078961774283],[-75.09464263916016,39.93817189499188],[-75.12039184570312,39.94435771955196],[-75.1626205444336,39.95580659996906]]]]}"
    ],
    "rasters": [
        "nlcd-2011-30m-epsg5070-0.10.0"
    ],
    "zoom": 0
}

HTTP/1.1 200 OK
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Accept-Encoding, Accept-Language, Host, Referer, User-Agent, Access-Control-Request-Method, Access-Control-Request-Headers
Access-Control-Allow-Methods: GET, POST, OPTIONS, DELETE
Access-Control-Allow-Origin: *
Content-Length: 189
Content-Type: application/json; charset=UTF-8
Date: Tue, 30 Aug 2016 17:33:22 GMT
Server: spray-can/1.3.3

[
    {
        "11": 7569,
        "21": 5569,
        "22": 6442,
        "23": 24881,
        "24": 39047,
        "31": 24,
        "41": 702,
        "43": 22,
        "52": 23,
        "71": 129,
        "81": 292,
        "82": 44,
        "90": 607,
        "95": 345
    }
]

About to test with the main project.

rajadain commented 8 years ago

While testing with the main app, I can't seem to get more than 20 programs before a timeout. Can I check the execution on your machine to see how it can handle 100 programs?

jtarricone commented 8 years ago

This is a set of about 100 programs (sorry about the cluttered network log) --> download 4

jtarricone commented 8 years ago

This shows the same set of programs but with different buffer sizes --> download 5

jtarricone commented 8 years ago

This is a call that might be more representative of expected use? --> download 6

rajadain commented 8 years ago

I think it's largely because of Docker running slower on macOS because it runs via a VM, not natively.

jtarricone commented 8 years ago

I made a couple updates per feedback, let me know if there's anything else that looks wonky.

rajadain commented 8 years ago

+1 tested, code looks great. Great job, congratulations on your first Scala / Geoprocessing PR 🎉