Open emiliom opened 6 years ago
We will explore the new web services at CUAHSI: http://hiscentral.cuahsi.org/webservices/hiscentral.asmx
GetSeriesMetadataCountOrData
. Here's the info about it copied directly from http://hiscentral.cuahsi.org/webservices/hiscentral.asmx
getSeriesCatalogInBoxPaged
has been deprecated. I think this was a new API (2017?) that we never used, but it's now deprecated.Notes about how the MMW CUAHSI WDC currently operates:
GetServicesInBox2
is run in the background and cached for 1 week (note: we originally requested a 1-day cache)GetSeriesCatalogForBox2
is used as the main search API, using the cached service results. We discussed using GetSeriesCatalogForBox3
but found no compelling advantages relative to the disadvantage of getting a much larger payload back.Background discussions from 2017, during development:
GetSeriesCatalogForBox2
vs GetSeriesCatalogForBox3
, and final detailed decisions to be implemented)It'd be really nice if there was a catalog API operation that excluded grid services. OR if one of the existing operations had a parameter that allowed for the exclusion of grid services.
Here are initial results from an assessment today using a jupyter notebook I'll post later. I'll post more details later, too.
Each result is for a search based on a 1° x 1° square box ("square" in lat-lon coordinates) centered at the center point listed. Search requests were issued with suds-jurko
. The last 3 columns show response times (including suds processing time) for 3 API's:
GetSeriesCatalogForBox2
(currently used in the MMW portal)GetSeriesCatalogForBox3
GetSeriesMetadataCountOrData
(the newer API we're investigating)Location | latlon center | AOI (km2) | series count | non-grid series count | GSCFB2 | GSCFB3 | GSMCD |
---|---|---|---|---|---|---|---|
Texas, south of Austin | 30.0, -97.5 | 10,707 | 5,288 | 4,488 | 20.5 s | 53.0 s | 36.9 s |
Just N of the Schuykil river near Philly | 40.1, -75.5 | 9,457 | 23,001 | 22,205 | 86.0 s | 181.0 s | 178.0 s |
1° N of the above PA/DRB point | 41.1, -75.5 | 9,317 | 16,744 | 15,944 | 60.0 s | 110.0 s | 128.0 s |
Central Iowa | 42.0, -93.0 | 9,188 | 1,618 | 818 | 6.77 s | 12.4 s | 11.2 s |
Halfway between Olympia, WA and Portland, OR | 46.5, -123.0 | 8,511 | 9,226 | 8,426 | 44.7 s | 73.0 s | 69.0 s |
Just realized that the HIS API's (or at least GetSeriesMetadataCountOrData
) also accept GET and POST requests, not just SOAP. I don't know if that makes any difference in performance, though.
The Jupyter notebook I used for this assessment, CUAHSI_HISCentral_AOI_service_tests.ipynb
, can be accessed here. See the descriptions at the top.
This notebook was run once for each AOI listed in the table above. The specific results shown in the notebook snapshot (for the "1° N of the above PA/DRB point" AOI) differ from the ones listed in the table, because the data are dynamic and factors such as CUAHSI server loads and network latency are not constant. The results in the notebook were run today, Monday April 2 at 3:40pm PT, while results in the table above were run on Saturday March 24 (weekend server loads are probably lighter).
Extra notes I jotted down while composing the MMW issue I just created. Too much detail to include in that issue, but worth capturing here for easy reference.
@emiliom, thanks for all your effort at testing, documenting, and finding likely paths to solve the WDC site search performance issue.
Goal: Find WDC sites & data series in larger, more useful AoI's.
Revisit choice of catalog search API requests, to explore newer ones that are faster, more flexible and more effective.
Background / research