Support time filtering for FeatureLayer

patrickarlt commented 10 years ago

In working on supporting https://github.com/Esri/esri-leaflet/issues/152 I have come up with 2 possible implementation for supporting time enabled service.

Option 1 focuses on optimizing for fast loading by using service side filtering with the time parameter
Option 2 focuses on optimizing for faster filtering and possibly animation on the client by loading all features up front regardless of time and filtering on the client.

Here are some of the implementation details. My personal preference is option 2 since the implementation is much simpler. But I would like some comments on what people are interested in seeing.

Option 1

new L.esri.(url, {
    from: fromDate, // JavaScript date object for the start of the time range
    to: toDate // JavaScript date object for the end of the time range
}).addTo(map);

This approach is similar to the implementation of time enabled layers in the ArcGIS JS API. We query the service metadata for timeInfo and query features by time. But we also maintain a client side index of loaded features that can be queried against.

if from and optionally to are passed query the service with them when requesting features 2a. when you get features back index ALL date fields in the binary search 2b. get the service metadata to get timeInfo so we know which fields to query on
when BOTH 2a and 2b are complete you can now filter by time with setTimeRange and getTimeRange
only request features within the time range
when setTimeRange is used use the index to query for already loaded features and load any additional features from the service
Pros
- Still get to request features right away
- Uses timeInfo to help index features
- Response times will be faster because we will be requesting less features
  Cons
- setTimeRange behavior might be unpredictable if called when loading features or is we don't have timeInfo
- animating features over time will stutter because we have to load features at different time ranges asynchronously
- limited to only filtering by fields in timeInfo
  Option 2

This option throws out timeInfo on the service in favor of requested all features and handling the time filtering on the client.

new L.esri.(url, {
    from: fromDate, // JavaScript date object for the start of the time range
    to: toDate // JavaScript date object for the end of the time range
    filterTimeOn: "timestamp" // The field to filter the dates on
}).addTo(map);

or

new L.esri.(url, {
    from: fromDate, // JavaScript date object for the start of the time range
    to: toDate // JavaScript date object for the end of the time range
    filterTimeOn: ["startTime", "endTime"] // The fields to filter the dates on
}).addTo(map);

request all features as normal regards of if they are inside the time range
when you get feature back index all the date fields in the binary search index on the client
filter the features shown based querying the index for all features within the time range
Pros
- will be able to quickly animate and filter features since all data is already loaded
- can filter on any date field not just ones inside timeInfo
- no additional requests to the service required
- simpler implimentation
  Cons
- loading all the features regardless of time could impact larger services since it will bottleneck on data loading
- doesn't use any information stored in timeInfo
- have to define time fields up front

mjuniper commented 10 years ago

This is a tough call. One possible answer is both :smiley:. You could build two separate implementations or take an option on the constructor.

Overall, if you have to choose, I'd say go with option 2. I'm not sure but the main use cases I can imaging would involve animation. Also, this is just a more interesting (to me) way to solve the problem.

mjuniper commented 10 years ago

If I ever get some time, I'd love to contribute to this project - either on this particular issue or others.

ajturner commented 10 years ago

I recommend you mimic the interaction of San Francisco Crimespotting:

Request features, sorted by time descending, up to max feature count
Request stats that get the full time range available
build a histogram that shows the time extent range
fill in the histogram bars for the count of features based on what has actually come back
filters happen in memory for requested features, and as the user expands the histogram you make subsequent requests for features.

Perhaps we can use @benheb's timeslider or our original one in D3 at GeoCommons

nixta commented 10 years ago

I don't think there's a single solution that will be best for all data. So perhaps both :)

Seems to me we should use TimeInfo where it's available. That's not to say that you couldn't also support overriding/augmenting with arbitrary date fields.

Also, a mechanism to pre-load data based off additional time ranges would allow a quick initial load and still support smooth animation (i.e. load all the data, just not all up front). I can't think of a (sensible) use-case where someone jumps all over the shop so this should be do-able (or else accept a function that returns time chunks in preference). If you want to scan time in a way that makes sense to animate smoothly, I suspect you're generally going to be looking at adjacent time windows. I've always wished the JS API would do this anyway.

patrickarlt commented 10 years ago

@ajturner This isn't a histogram widget this is about managing the representation of a slice of time on the map. If developers want histograms thats something extra that I'm probably not do.

@nixta @mjuniper "Both" really isn't an option here. I don't want to confuse developers with multiple implementations and ways to do the same thing. This is the same reason why there are no "modes" for FeatureLayer. I also don't have time to do both.

Preloading extra data outside the defined time range is interesting maybe pad the range by 10-25% on each side to facilitate faster loading which is an interesting idea.

ajturner commented 10 years ago

@ajturner This isn't a histogram widget this is about managing the representation of a slice of time on the map. If developers want histograms thats something extra that I'm probably not do.

Sure - but we can do the same via the API. Load up to feature count and then be able to call for more features that are stored in memory.

I believe this is how we've supported that in StreamLayer /cc @chelm

JimBlaney commented 10 years ago

I'm a fan of including both methods -- either as separate prototypes or perhaps a strategy flag in the ctor that would determine the behavior. There are benefits to filtering both ways, depending on your use of the layer (dynamic vs. static data, e.g., dashboard vs. political map).

Note: I typed this out before reading others' input -- basically a bump for @mjuniper

nixta commented 10 years ago

It seems from @patrickarlt's proposals that going for Option 1 (optionally including some smarts to pre-fetch data outside the current time-slice) would not preclude Option 2 fitting in behind the same constructor down the road.

Or vice versa, come to think of it.

I would vote for Option 1 and if someone wants to tackle Option 2 later, it just becomes a strategy flag on the constructor as @JimBlaney suggested.

chelm commented 10 years ago

I vote option 2 as I think it provides more flexibility for creating dynamic, animating maps. How often do you want to load only a slice of data w/o wanting to view the next time slice and so on? Having to wait for the server to load discrete time slices is lame.

The drawback of option 2 having to wait for all the data to load can easily be mitigated by simply rendering the features that fall within the initial time slice as they're loaded.

If you have all the data on the client it becomes much easier to pan through time and place other constraints on the data. It also creates a better experience for the user at the expense of taking a bit longer to load.

swingley commented 10 years ago

I vote option 1. You won't always be able to retrieve all data and it doesn't make sense to say esri-leaflet feature layers support time filtering but don't use time as exposed through the AGS REST API.

When you want more data (so you can animate through time), specify a wider time window with to and from. When your layer loads, set a smaller window so you can animate. If you really wanted to try to get all data, couldn't you say from: distantPast, to: farFuture? This would be the equivalent of a sql where clause of 1=1.

nixta commented 10 years ago

@chelm wouldn't that assume that a) the data is streamed and not a single response and b) the ones you want are first in the stream?

It seems a better solution would be to load the initial time-slice using Option 1 (quickest way to get the data you need) and then load additional data according to some scheme (ideally user-definable, but otherwise perhaps defaulting to the next equal-sized slice) in the background.

@swingley The problem with a broad timeslice is the one trying to be mitigated here, no? Low latency on the first draw while still supporting low/zero latency on animation.

But I agree with @swingley that as part of esri-leaflet if it's going to do time filtering, then surely it should be using time as exposed through our API.

chelm commented 10 years ago

@chelm wouldn't that assume that a) the data is streamed and not a single response and b) the ones you want are first in the stream?

Feature Services are pages of data. So for a service 5000 features esri-leaflet will request 5 pages of data (i'm assuming here...). So the layers themselves already act similar to streams.

patrickarlt commented 10 years ago

@nixta @chelm @swingley I think I have a hybrid of 1 and 2 that will work. Here is how it would work

new L.esri.(url, {
    from: fromDate, // JavaScript date object for the start of the time range
    to: toDate, // JavaScript date object for the end of the time range
    filterTimeOn: "timestamp" // The field to filter the dates on (must be in your timeInfo)
    useTimeFilter: false // use the `time` option when requesting features
}).addTo(map);

The first 3 options are the same as before but there is a new option useTimeFilter that would control the use of the time param when requesting features. So when useTimeFilter is false all features are requested like option 2 and when true it filters to only features within range like option 1. However since you have to define your time field(s) up front with filterTimeOn you don't have to make a request for metadata to figure out timeInfo.

Pros

Still get to request features right away
Uses timeInfo to help index and query features
Response times will be faster when using useTimeFilter=true because we will be requesting less features

Cons

animating features over time when using useTimeFilter=true will stutter because we have to load features at different time ranges asynchronously
limited to only filtering by fields in timeInfo
have to define time fields upfront
possible confusion about the best way to use useTimeFilter

I think this is a good balance between the two without making 2 totally separate implementations.

There are a lot of good ideas in here especially from @nixta with next-slice or padded loading but they might require extra options or requests so they will probably be left out of this first implementation.

patrickarlt commented 10 years ago

Feature Services are pages of data. So for a service 5000 features esri-leaflet will request 5 pages of data (i'm assuming here...). So the layers themselves already act similar to streams.

@chelm This actually isn't true. Esri Leaflet mirrors the JS API by dividing map into a grid and making 1 query per grid cell. If the results of the query exceed the limit in the cell it doesn't make an additional request.

@swingley does the JS API re-query for more results if there are > 2000 features in a cell?

chelm commented 10 years ago

@patrickarlt ahh, very well then, @chelm keeps quiet... :)

patrickarlt commented 10 years ago

@chelm Do you think it SHOULD make extra requests to get all the features?

chelm commented 10 years ago

Personally yes, but it is fraught with issues. Like FeatureServices with 500k points, etc.

swingley commented 10 years ago

@patrickarlt no, js api doesn't re-query if there are more features than maxRecordCount (2k for hosted feature services).

patrickarlt commented 10 years ago

I have the new hybrid method described above working on my local machine. But @swingley has pointed out that the useTimeFilter option name I proposed is confusing.

If I define a start and end time shouldn't it be obvious I want to filter by time?

The use case for useTimeFilter looks like this...

Load all the data up front so you can filter it faster on the client without waiting for more data to load.

Any other ideas on what the option should be called. I think its a good option to have I'm just having trouble coming up with a good name. @chelm @swingley @nixta @ajturner?

patrickarlt commented 10 years ago

@nixta has pointed out that the filterTimeOn option doesn't really make sense anymore since you cant filter on any date field. Changing it to timeField would probably clear up the confusion.

Jury is still out on what useTimeFilter should be renamed to.

JimBlaney commented 10 years ago

How about timeFilterMode ::= "client" | "server" defaulting to "server"?

swingley commented 10 years ago

Changing filterTimeOn to timeField makes sense to me.

I think a better name for useTimeFilter would be something like retrieveAll.

nixta commented 10 years ago

Well, you can have separate fields for start and end time too. In practice, does that happen? I don't have much experience with time enabled layers.

I like @JimBlaney's suggestion.

Perhaps:

new L.esri.(url, {
    from: fromDate, // JavaScript date object for the start of the time range
    to: toDate, // JavaScript date object for the end of the time range
    timeField: "timestamp" // The field to filter the dates on (must be in your timeInfo)
    timeFilterMode: "server" // "server" == query with timeInfo, "client" = filter on client
}).addTo(map);

Or, in the case of differing start and end fields…

new L.esri.(url, {
    from: fromDate, // JavaScript date object for the start of the time range
    to: toDate, // JavaScript date object for the end of the time range
    timeField: {
        from: "timestamp1", // The FROM field to filter the dates on (must be in your timeInfo)
        to: "timestamp2", // The TO field to filter the dates on (must be in your timeInfo)
    },
    timeFilterMode: "server" // "server" == query with timeInfo, "client" = filter on client
}).addTo(map);

My suggestion before @JimBlaney piped up was going to be something like:

    filterRequests: true // true == query with timeInfo, false = filter on client

patrickarlt commented 10 years ago

As for the default value of timeFilterMode I'm going to default it to "server" which should be acceptable for most use cases.

Thanks everyone I will have a preview of all this later next week.

Esri / esri-leaflet