Esri / esri-leaflet

A lightweight set of tools for working with ArcGIS services in Leaflet. :rocket:
https://developers.arcgis.com/esri-leaflet/
Apache License 2.0
1.6k stars 798 forks source link

Limit number of requests to 1 with FeatureLayer #414

Closed Greigrm closed 9 years ago

Greigrm commented 9 years ago

Is there a way to (or can the feature be added) limit the number of requests made to a single request rather than the current 4?

I notice that when using a featurelayer the visible area is split into 4 "tiles" and each requested separately - on my implementation the number of queries is sensitive and causes performance deterioration - I would rather do a single larger query for the visible area - is this possible to do with the current code?

jgravois commented 9 years ago

do you know whats causing the performance deterioration in your situation? Is the feature layer hosted in ArcGIS Online or published using ArcGIS Server?

in general, i would suggest utilizing Tasks/Query if you'd like to make a single query using the extent of the map.

var map = L.map('map').setView([40, -100], 4);
...
var query = new L.esri.Tasks.Query('http://sampleserver6.arcgisonline.com/arcgis/rest/services/Census/MapServer/3');

query.within(map.getBounds).where('1=1').run(function(error, results) {
  L.geoJson(results).addTo(map);
});
Greigrm commented 9 years ago

The peformance is down to the nature of the dataset/indexes and load on the database server - its far more efficient to get all the data I require for the current viewport in a single transaction - the database performance metrics show this to be true every time. When using high user volumes it also (obviously) significantly reduces the number of queries. I often see when loading the map the first query returning quickly, but the final "quarter" being significantly delayed.

patrickarlt commented 9 years ago

@Greigrm the code that @jgravois posted above is about the equivalent of the "SNAPSHOT" mode from the JS API. The idea of the gridded query is something that mirrors the JS API "ON_DEMAND" mode.

I still really recommend using the gridded query. I'm having a hard time understanding how running 4 simultaneous queries is causing such a major performance hit. Especially since the bounds being queried are consistent so requests can be cached.

If you still really want to re query the bounds every single time you can do something like this.

var map = L.map('map').setView([40, -100], 4);

var loadedFeatures = {};

var features = new L.GeoJson();

function getFeatures(){
  query.within(map.getBounds()).run(function(error, geojson, response) {
    // seperate features you have already loaded from new features
    // add the new features to an existing L.geojson object.
    for (var i = geojson.features.length - 1; i >= 0; i--) {
      if(!loadedFeatures[feature.id]){
        var feature = geojson.features[i];
        features.addData(feature);
        loadedFeatures[feature.id] = true;
      }
    }
  });  
}

getFeatures();

// limitExecByInterval will limit the getFeatures function to only
// fire once every 150 milliseconds preventing extra queries
map.on(moveend, L.Util.limitExecByInterval(getFeatures, 150));

@jgravois where('1=1') is actually a default.

Greigrm commented 9 years ago

Sorry for the delay in responding. I'll have a look at the code above and see if I can put something together.

For further explanation. I am seeing this on several layers, mostly heavily populated layers (bringing back several hundred features at a time). Performing a trace while stress testing the application shows definitive metrics that show the multiple queries is detrimental to performance. This is for a number of reasons.

If we are talking about high relatively numbers of simultaneous clients (say 100), this obviously equates to 100 clients x 4 queries = 400 queries.

All queries require an arcgis server process to service the request, which depending on the sequencing of requests a particular client probably won't have their 4 request serviced one after the other, as other client requests will be queued inbetween. At times I see partial loads of data on screen and a very noticable wait for the final request to be serviced - obviously this is also heavily dependent on the number of arcgis processes available and the process isolation strategy used.

Another aspect of the delays is coming from the sqlserver database which contains the spatial data, particularly in cases where there is a where clause in the query on unindexed columns. At the basic level this causes 4 full table scans in the database instead of 1. Even when there is no unindexed query, if the 4 queries are hitting the same grid(s) in the spatial index then you are immediately quadrupling the return time of the complete data set as the sqlserver instance scans the same set of features for each query.

patrickarlt commented 9 years ago

This is diving pretty deep into ArcGIS Sever configuration land which I know almost nothing about. But I'll take a stab at it and let @jgravois and @alaframboise correct me if I'm wrong.

I'm going to bet this is a caching problem on the server side

The big question is do you see the same performance characteristics in the JS API? Try using this sample with your own service and see if the same performance characteristics surface. If they do this might be a performance issue with your server. If not I need the JS API team to explain to me how they are achieving this.

If we are talking about high relatively numbers of simultaneous clients (say 100), this obviously equates to 100 clients x 4 queries = 400 queries

I don't think this is true. Turn on caching in your browser. You can see that subsequent queries result in cached 304 Not Modified responses. So requesting the same grid cells over and over results in some very aggressive caching behavior on the client. This doesn't solve your 100 simultaneous clients problem but it does prove that some caching is used.

Second, I think each query should be cached by the server for 100 clients x 4 queries = 4 uncached queries and 96 cached queries. If all 100 clients connect at the same maybe that would result in some cache misses, I cant say because I'm not an expert.

I know for a fact that the ArcGIS API for JavaScript also uses these gridded queries because it allows some aggressive caching, so its highly unlikely that this will change in Esri Leaflet.

Another aspect of the delays is coming from the sqlserver database which contains the spatial data, particularly in cases where there is a where clause in the query on unindexed columns.

Have you tried indexing on those columns? that would almost definitely result in a speed increase.

Even when there is no unindexed query, if the 4 queries are hitting the same grid(s) in the spatial index then you are immediately quadrupling the return time of the complete data set as the sqlserver instance scans the same set of features for each query.

How is this scanning the same set? Shouldn't it reduce the set based on the spatial index first and then reduce it further based on other things, i.e. type=Fire, time range ect? Even if this works the other way, starting with features in the time range, matching the query, ect and then reducing further based on the spatial index you should never be working with the full set of data, that the point of indexes.

Potential solutions

Simplify Geometries

This wont work for points but works great for polygons and polylines. Pass simplifyFactor: 0.25 or something similar when creating the layer. Even small amounts can make a big impact. Increasing simplification towards one this will make the request time of the data come back faster since payloads will be smaller.

Reduce precision

By default Esri Leaflet requests 6 digits of decimal point precision about 40-100 mm http://en.wikipedia.org/wiki/Decimal_degrees. Reducing this will make the request time of the data come back faster.

Reduce number of fields requested

Requesting less fields works great for datasets that return hundreds to thousands of features. fields: ['FID', 'type', 'title'] in the options might go a long way.

Set an enormous cellSize

If you REALLY want to reduce the query count you can set the cellSize option to a factor fo the max width/height of the window.

var factor = 2; // how many times bigger then the window
var cellSize = parseInt((Math.max(window.innerHeight, window.innerWidth) * factor).toFixed()); // get the larger of the width or height of the window

var zipcodes = L.esri.featureLayer('http://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_ZIP_Codes/FeatureServer/0', {
    cellSize: cellSize
}).addTo(map);

Test this solution THROUGHLY before attempting to use it. I really don't recommend this In my testing I saw response times jump orders of magnitude, but it has the desired effect generally reducing queries to 1 or 2 per client. increasing factor to 4 almost guarantees 1 request but results in an ENORMOUS area being queried.

Make the requests yourself

Use the above code from my previous comment to build your own system for loading and managing features.