WikiWatershed / mmw-geoprocessing

A Spark Job Server job for Model My Watershed geoprocessing.
Apache License 2.0
6 stars 6 forks source link

Use Broadcast Variable for R-Tree #36

Closed jamesmcclain closed 8 years ago

jamesmcclain commented 8 years ago

These changes cause a broadcast variable to be used to communicate the line collection to the tasks in RasterLinesJoin rather than multiple subsets of the lines to each one.

rajadain commented 8 years ago

Works correctly and in about the same time.

{
    "classPath": "org.wikiwatershed.mmw.geoprocessing.MapshedJob", 
    "context": "geoprocessing", 
    "duration": "7.618 secs", 
    "jobId": "f5b591a4-f92f-48fa-9413-78cea7eb9837", 
    "result": {
        "List(0)": 1606, 
        "List(11)": 11614, 
        "List(21)": 506855, 
        "List(22)": 172465, 
        "List(23)": 79273, 
        "List(24)": 26015, 
        "List(31)": 5397, 
        "List(41)": 817941, 
        "List(42)": 8636, 
        "List(43)": 15772, 
        "List(52)": 143166, 
        "List(71)": 10518, 
        "List(81)": 566633, 
        "List(82)": 458357, 
        "List(90)": 67961, 
        "List(95)": 3260
    }, 
    "startTime": "2016-05-10T16:31:52.303Z", 
    "status": "FINISHED"
}

I like that we're using Broadcast variables, that seems to be their purpose.

+1